Re: Question about Fuzzy search in Solr
Thanks. Is any extra configuration from the Solr side to make this work ? Any additional text files like synonyms.txt, any additional fields or any changes in schema.xml or solrconfig.xml ? On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć r@solr.pl wrote: Hello! Is this what you are looking for https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches ? -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi, I need to know how we can implement fuzzy searches using Solr. Can someone provide any links to any relevant documentation ? -- Thanks and Regards Rahul A. Warawdekar
Re: Question about Fuzzy search in Solr
Got it. Thanks Rafał ! On Mon, Sep 17, 2012 at 6:37 PM, Rafał Kuć r@solr.pl wrote: Hello! There is no need to include any changes or additional component to have fuzzy search working in Solr. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Thanks. Is any extra configuration from the Solr side to make this work ? Any additional text files like synonyms.txt, any additional fields or any changes in schema.xml or solrconfig.xml ? On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć r@solr.pl wrote: Hello! Is this what you are looking for https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches ? -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi, I need to know how we can implement fuzzy searches using Solr. Can someone provide any links to any relevant documentation ? -- Thanks and Regards Rahul A. Warawdekar
Re: Question about Fuzzy search in Solr
Thanks Jack. We are using Solr 3.4. On Mon, Sep 17, 2012 at 8:18 PM, Jack Krupansky j...@basetechnology.comwrote: That doc is out of date for 4.0. See the 4.0 Javadoc on FuzzyQuery for updated info. The tilda right operand is now an integer editing distance (number of times to insert char, delete char, change char, or transpose two adjacent chars to map index term to query term) that is limited to 2. Be aware that if you use fuzzy query in 3.6/3.6.1 or earlier, it will change when you go to 4.0. -- Jack Krupansky -Original Message- From: Rafał Kuć Sent: Monday, September 17, 2012 7:15 AM To: solr-user@lucene.apache.org Subject: Re: Question about Fuzzy search in Solr Hello! Is this what you are looking for https://lucene.apache.org/**core/old_versioned_docs/**versions/3_0_0/** queryparsersyntax.html#Fuzzy%**20Searcheshttps://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches ? -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Hi, I need to know how we can implement fuzzy searches using Solr. Can someone provide any links to any relevant documentation ? -- Thanks and Regards Rahul A. Warawdekar
Re: DIH XML configs for multi environment
Hi Pranav, If you are using Tomcat to host Solr, you can define your data source in context.xml file under tomcat configuration. You have to refer to this datasource with the same name in all the 3 environments from DIH data-config.xml. This context.xml file will vary across 3 environments having different credentials for dev, stag and prod. eg DIH data-config.xml will refer to the datasource as listed below dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME* type=JdbcDataSource readOnly=true / context.xml file which is located under /TOMCAT_HOME/conf folder will have the resource entry as follows Resource name=*YOUR_DATASOURCE_NAME* auth=Container type= username=X password=X driverClassName= url= maxActive=8 / On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com wrote: The DIH XML config file has to be specified dataSource. In my case, and possibly with many others, the logon credentials as well as mysql server paths would differ based on environments (dev, stag, prod). I don't want to end up coming with three different DIH config files, three different handlers and so on. What is a good way to deal with this? *Pranav Prakash* temet nosce -- Thanks and Regards Rahul A. Warawdekar
Re: DIH XML configs for multi environment
http://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource http://docs.codehaus.org/display/JETTY/DataSource+Examples On Wed, Jul 11, 2012 at 2:30 PM, Pranav Prakash pra...@gmail.com wrote: That's cool. Is there something similar for Jetty as well? We use Jetty! *Pranav Prakash* temet nosce On Wed, Jul 11, 2012 at 1:49 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi Pranav, If you are using Tomcat to host Solr, you can define your data source in context.xml file under tomcat configuration. You have to refer to this datasource with the same name in all the 3 environments from DIH data-config.xml. This context.xml file will vary across 3 environments having different credentials for dev, stag and prod. eg DIH data-config.xml will refer to the datasource as listed below dataSource jndiName=java:comp/env/*YOUR_DATASOURCE_NAME* type=JdbcDataSource readOnly=true / context.xml file which is located under /TOMCAT_HOME/conf folder will have the resource entry as follows Resource name=*YOUR_DATASOURCE_NAME* auth=Container type= username=X password=X driverClassName= url= maxActive=8 / On Wed, Jul 11, 2012 at 1:31 PM, Pranav Prakash pra...@gmail.com wrote: The DIH XML config file has to be specified dataSource. In my case, and possibly with many others, the logon credentials as well as mysql server paths would differ based on environments (dev, stag, prod). I don't want to end up coming with three different DIH config files, three different handlers and so on. What is a good way to deal with this? *Pranav Prakash* temet nosce -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar
Re: Can't index sub-entitties in DIH
Hi, One of the possibilities for this kind of issue to occur may be the case sensitivity of column names in Oracle. Can you apply a transformer and check the entity map which actually contains the keys and their values ? Also, please try specifying upper case field names for Oracle and try if that works. something like entity name=tipodocumento query=SELECT *NOMBRE* FROM tipodocumento where *IDTIPODOCUMENTO* = '${documento.*TIPODOCUMENTO*}' field column=*NOMBRE* name=nombre / /entity On Tue, Jun 5, 2012 at 9:57 AM, Rafael Taboada kaliman.fore...@gmail.comwrote: Hi Gora, Your configuration files look fine. It would seem that something is going wrong with the SELECT in Oracle, or with the JDBC driver used to access Oracle. Could you try: * Manually doing the SELECT for the entity, and sub-entity to ensure that things are working. The SELECTs are working OK. * Check the JDBC settings. I'm using tha last version of jdbc6.jar for Oracle 11g. It seems JDBC setting is OK because solr brings data. Sorry, I do not have access to Oracle so that I cannot try this out myself. Also, have you checked the Solr logs for any error messages? Finally, I just noticed that you have extra quotes in: ...where usuario_idusuario = '${usuario.idusuario}' I doubt that is the cause of your problem, but you could try removing them. If I remove quotes, there is an error about this: SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT nombre FROM tipodocumento WHERE idtipodocumento = Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619) ... 5 more Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:445) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:879) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:450) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193) at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:873) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1909) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1871) at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:318) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246 My config files using Oracle are: db-data-config.xml dataConfig
Re: how to show DIH query sql in log file
Hi, Turn the Solr logging level to FINE for the DIH packages/classes and they will show up in the log. http://hostname:port/solr/core/admin/logging On Fri, Jun 1, 2012 at 9:34 AM, wangjing ppm10...@gmail.com wrote: how to show DIH query's sql in log file for troubleshooting? thanks. -- Thanks and Regards Rahul A. Warawdekar
Re: possible status codes from solr during a (DIH) data import process
Hi, Thats correct. For failure, you have to check for the text *Indexing failed. Rolled back changes* under the lst name=statusMessages tag. One more thing to note here is that there may be a time during the indexing process where the indexing is complete but the index is not committed and optimized yet. You would need to check if the response listed below is present along with the success message to term it as a complete success. *str name=Committed2012-05-31 15:10:45/str str name=Optimized2012-05-31 15:10:45/str* On Thu, May 31, 2012 at 3:42 PM, geeky2 gee...@hotmail.com wrote: hello all, i have been asked to write a small polling script (bash) to periodically check the status of an import on our Master. our import times are small, but there are business reasons why we want to know the status of an import after a specified amount of time. i need to perform certain actions based on the status of the import, and therefore need to quantify which tags to check and their appropriate states. i am using the command from the DataImportHandler HTTP API to get the status of the import: OUTPUT=$(curl -v http://${SERVER}:${PORT}/somecore/dataimport?command=status) can someone tell me if i have these rules correct? 1) during an import - the status tag will have a busy state: example: str name=statusbusy/str 2) at the completion of an import (regardless of failure or success) the status tag will have an idle state: example: str name=statusidle/str 3) to determine if an import failed or succeeded - you must interrogate the tags under lst name=statusMessages and specifically look for : success: str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0 documents./str failure: str name=Indexing completed. Added/Updated: 603378 documents. Deleted 0 documents./str thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/possible-status-codes-from-solr-during-a-DIH-data-import-process-tp3987110.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: Not able to use the highlighting feature! Want to return snippets of text
Hi, Can you please provide the definitions of the following 3 objects from your solrconfig.xml ? str name =hl.fragListBuildersimple/str str name =hl.fragmentsBuildercolored/str str name=hl.fragmenterregex/str For eg, the simple hl.fragListBuilder should be defined as mentioned below in your solrconfig.xml fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=true/ On Mon, May 21, 2012 at 2:06 PM, 12rad prama.an...@gmail.com wrote: The field I am trying to highlight is stored. field name=text type=text_en required=false compressed=false omitNorms=false indexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true/ In the searchHandler i've set the parameters as follows: str name=hlon/str str name=hl.fltext/str str name =hl.snippets5/str str name=hl.fragsize1000/str str name=hl.maxAnalyzedChars51/str str name=hl.requireFieldMatchtrue/str str name=hl.fragmenterregex/str str name =hl.fragListBuildersimple/str str name =hl.fragmentsBuildercolored/str str name=hl.phraseLimit1000/str str name=hl.usePhraseHighlightertrue/str str name=hl.highlightMultiTermtrue/str str name =hl.useFastVectorHighlighertrue/str I still don't see any highlighting. I've managed to get snippets of text but the actual word is not highlighted. I don't know where I am going wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985174.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: Not able to use the highlighting feature! Want to return snippets of text
Hi, I believe, in your colored fragmentsBuilder definition, you have not mentioned anything in your pre and post tags and that may be the reason that you are getting snippets of text, without highlighting. Please refer http://wiki.apache.org/solr/HighlightingParameters and check the hl.fragmentsBuilder section. Try specifying the pre and post tags with information as mentioned below. (same as wiki link above) !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder On Mon, May 21, 2012 at 3:52 PM, 12rad prama.an...@gmail.com wrote: For the fragListBuilder it's fragListBuilder name=simple default=true class=solr.highlight.SimpleFragListBuilder/ fragment builder is fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre/str str name=hl.tag.post/str /lst /fragmentsBuilder fragmenter name=regex class=solr.highlight.RegexFragmenter lst name=defaults int name=hl.fragsize70/int float name=hl.regex.slop0.5/float str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str /lst /fragmenter Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985212.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Issue with DIH when database is down
Hi, I am using Solr 3.4 on Tomcat 6 and using DIH to index data from a MS SQL Server 2008 database. In case my database is down, or is refusing connections due to any reason, DIH throws an exception as mentioned below org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: ... Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368) But when the database is up and running and the next indexing job runs, it gives me the same error. I need to restart Tomcat in order to succesfully connect again to the database. My dataSource settings in data-config.xml are as follows dataSource jndiName=java:comp/env/jdbc/XXX type=JdbcDataSource readOnly=true / Has anyone come across this issue before ? If yes, what is the resolution ? Am I missng anything in the dataSource attributes (autoCommit=true) ?? -- Thanks and Regards Rahul A. Warawdekar
Solr request tracking
Hi, Is there any mechanism by which we can track and trend the incoming Solr search requests ? Some mechanisms like logging all incoming Solr requests to a different log file than Tomcat's and have a tool to trend the patterns ? -- Thanks and Regards Rahul A. Warawdekar
Re: how to limit solr indexing to specific number of rows
Hi, What is the error that you are getting ? ROWNUM works fine with DIH, I have tried and tested it with Solr 3.1. One thing that comes to my mind is the query that you are using to implement ROWNUM. Do you replaced the in the query by a lt; in dataconfig.xml ? like ROMNUM lt; =100 ? On Thu, May 3, 2012 at 4:11 PM, srini softtec...@gmail.com wrote: I am doing database import using solr DIH. I would like to limit the solr indexing to specific number. In other words If Solr reaches indexing 100 records I want to database import to stop importing. Not sure if there is any particular setting that would tell solr that I only want to import 100 rows from database and index those 100 records. I tried to give select query with ROMNUM=100 (using oracle) in data-config.xml, but it gave error. Any ideas!!! Thanks in Advance Srini -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-limit-solr-indexing-to-specific-number-of-rows-tp3960344.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: solr replication failing with error: Master at: is not available. Index fetch failed
Hi, Is the replication still failing or working fine with that change ? On Tue, Apr 24, 2012 at 2:16 PM, geeky2 gee...@hotmail.com wrote: that was it! thank you. i did notice something else in the logs now ... what is the meaning or implication of the message, Connection reset.? 2012-04-24 12:59:19,996 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 12:59:39,998 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. *2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Master at: http://bogus:bogusport/somepath/somecore/replication/ is not available. Index fetch failed. Exception: Connection reset* 2012-04-24 13:00:19,998 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:00:40,004 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:00:59,992 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:19,993 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:39,992 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:59,989 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:19,990 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:39,989 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:59,991 INFO [org.a -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: solr replication failing with error: Master at: is not available. Index fetch failed
Hi, In Solr wiki, for replication, the master url is defined as follows str name=masterUrlhttp://master_host:port /solr/corename/replication/str This url does not contain admin in its path where as in the master url provided by you, you have an additional admin in the url. Not very sure if this might be an issue but you can just check removing admin and check if replication works. On Tue, Apr 24, 2012 at 11:49 AM, geeky2 gee...@hotmail.com wrote: hello, thank you for the reply, yes - master has been indexed. ok - makes sense - the polling interval needs to change i did check the solr war file on both boxes (master and slave). they are identical. actually - if they were not indentical - this would point to a different issue altogether - since our deployment infrastructure - rolls the war file to the slaves when you do a deployment on the master. this has me stumped - not sure what to check next. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: Solr with UIMA
Hi Divakar, Try making your updateRequestProcessorChain as default. Simply add default=true as follows and check if that works. updateRequestProcessorChain name=uima *default=true* On Thu, Apr 19, 2012 at 12:01 PM, dsy99 ds...@rediffmail.com wrote: Hi Chris, Are you been able to get success to integrate the UIMA in SOLR. I too tried to integrate Uima in Solr by following the instructions provided in README i.e. the following four steps: Step1. I set lib/ tags in solrconfig.xml appropriately to point the jar files. lib dir=../../contrib/uima/lib / lib dir=../../dist/ regex=apache-solr-uima-\d.*\.jar / Step2. modified my schema.xml adding the fields I wanted to hold metadata specifying proper values for type, indexed, stored and multiValued options as follows: field name=language type=string indexed=true stored=true required=false/ field name=concept type=string indexed=true stored=true multiValued=true required=false/ field name=sentence type=text indexed=true stored=true multiValued=true required=false / Step3. modified my solrconfig.xml adding the following snippet: updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters str name=keyword_apikeyVALID_ALCHEMYAPI_KEY/str str name=concept_apikeyVALID_ALCHEMYAPI_KEY/str str name=lang_apikeyVALID_ALCHEMYAPI_KEY/str str name=cat_apikeyVALID_ALCHEMYAPI_KEY/str str name=entities_apikeyVALID_ALCHEMYAPI_KEY/str str name=oc_licenseIDVALID_OPENCALAIS_KEY/str /lst str name=analysisEngine/org/apache/uima/desc/OverridingParamsExtServicesAE.xml/str bool name=ignoreErrorstrue/bool lst name=analyzeFields bool name=mergefalse/bool arr name=fields strtext/str /arr /lst lst name=fieldMappings lst name=type str name=nameorg.apache.uima.alchemy.ts.concept.ConceptFS/str lst name=mapping str name=featuretext/str str name=fieldconcept/str /lst /lst lst name=type str name=nameorg.apache.uima.alchemy.ts.language.LanguageFS/str lst name=mapping str name=featurelanguage/str str name=fieldlanguage/str /lst /lst lst name=type str name=nameorg.apache.uima.SentenceAnnotation/str lst name=mapping str name=featurecoveredText/str str name=fieldsentence/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Step 4: and finally created a new UpdateRequestHandler with the following: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processoruima/str /lst Further I indexed a word file called text.docx using the following command: curl http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_fmap.content=attr_contentcommit=true -F myfile=@UIMA_sample_test.docx When I searched the file I am not able to see the additional UIMA fields. Can you please help if you been able to solve the problem. With Regds Thanks Divakar -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3923443.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: DataImportHandler w/ multivalued fields
Hi Briggs, By saying multivalued fields are not getting indexed prperly, do you mean to say that you are not able to search on those fields ? Have you tried actually searching your Solr index for those multivalued terms and make sure if it returns the search results ? One possibility could be that the multivalued fields are getting indexed correctly and are searchable. However, since your schema.xml has a raw_tag field whose stored attribute is set to false, you may not be able to see those fields. On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: In addition, I tried a query like below and changed the column definition to field column=raw_tag name=raw_tag splitBy=, / and still no luck. It is indexing the full content now but not multivalued. It seems like the splitBy ins't working properly. select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.* from site left outer join (freetags inner join freetagged_objects) on (freetags.id = freetagged_objects.tag_id and site.siteId = freetagged_objects.object_id) group by site.siteId Am I doing something wrong? Thanks, Briggs Thompson On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field column=raw_tag name=raw_tag/ /entity /entity /document -- Thanks and Regards Rahul A. Warawdekar
Re: Architecture and Capacity planning for large Solr index
Thanks ! My business requirements have changed a bit. We need one year rolling data in Production. The index size for the same comes to approximately 200 - 220 GB. I am planning to address this using Solr distributed search as follows. 1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves (load balanced) 2. Master configuration will be 4 CPU On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Rahul, This is unfortunately not enough information for anyone to give you very precise answers, so I'll just give some rough ones: * best disk - SSD :) * CPU - multicore, depends on query complexity, concurrency, etc. * sharded search and failover - start with SolrCloud, there are a couple of pages about it on the Wiki and http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/ Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Rahul Warawdekar rahul.warawde...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tuesday, October 11, 2011 11:47 AM Subject: Architecture and Capacity planning for large Solr index Hi All, I am working on a Solr search based project, and would highly appreciate help/suggestions from you all regarding Solr architecture and capacity planning. Details of the project are as follows 1. There are 2 databases from which, data needs to be indexed and made searchable, - Production - Archive 2. Production database will retain 6 months old data and archive data every month. 3. Archive database will retain 3 years old data. 4. Database is SQL Server 2008 and Solr version is 3.1 Data to be indexed contains a huge volume of attachments (PDF, Word, excel etc..), approximately 200 GB per month. We are planning to do a full index every month (multithreaded) and incremental indexing on a daily basis. The Solr index size is coming to approximately 25 GB per month. If we were to use distributed search, what would be the best configuration for Production as well as Archive indexes ? What would be the best CPU/RAM/Disk configuration ? How can I implement failover mechanism for sharded searches ? Please let me know in case I need to share more information. -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar
Re: Architecture and Capacity planning for large Solr index
Thanks Otis ! Please ignore my earlier email which does not have all the information. My business requirements have changed a bit. We now need one year rolling data in Production, with the following details - Number of records - 1.2 million - Solr index size for these records comes to approximately 200 - 220 GB. (includes large attachments) - Approx 250 users who will be searching the applicaiton with a peak of 1 search request every 40 seconds. I am planning to address this using Solr distributed search on a VMWare virtualized environment as follows. 1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves (load balanced) 2. Master configuration for each server is as follows - 4 CPUs - 16 GB RAM - 300 GB disk space 3. Slave configuration for each server is as follows - 4 CPUs - 16 GB RAM - 150 GB disk space 4. I am planning to use SAN instead of local storage to store Solr index. And my questions are as follows: Will 3 shards serve the purpose here ? Is SAN a a good option for storing solr index, given the high index volume ? On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Thanks ! My business requirements have changed a bit. We need one year rolling data in Production. The index size for the same comes to approximately 200 - 220 GB. I am planning to address this using Solr distributed search as follows. 1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves (load balanced) 2. Master configuration will be 4 CPU On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Rahul, This is unfortunately not enough information for anyone to give you very precise answers, so I'll just give some rough ones: * best disk - SSD :) * CPU - multicore, depends on query complexity, concurrency, etc. * sharded search and failover - start with SolrCloud, there are a couple of pages about it on the Wiki and http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/ Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Rahul Warawdekar rahul.warawde...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tuesday, October 11, 2011 11:47 AM Subject: Architecture and Capacity planning for large Solr index Hi All, I am working on a Solr search based project, and would highly appreciate help/suggestions from you all regarding Solr architecture and capacity planning. Details of the project are as follows 1. There are 2 databases from which, data needs to be indexed and made searchable, - Production - Archive 2. Production database will retain 6 months old data and archive data every month. 3. Archive database will retain 3 years old data. 4. Database is SQL Server 2008 and Solr version is 3.1 Data to be indexed contains a huge volume of attachments (PDF, Word, excel etc..), approximately 200 GB per month. We are planning to do a full index every month (multithreaded) and incremental indexing on a daily basis. The Solr index size is coming to approximately 25 GB per month. If we were to use distributed search, what would be the best configuration for Production as well as Archive indexes ? What would be the best CPU/RAM/Disk configuration ? How can I implement failover mechanism for sharded searches ? Please let me know in case I need to share more information. -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar
Re: Ordered proximity search
Hi Thomas, Do you always need the ordered proximity search by default ? You may want to check SpanNearQuery at http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/;. We are using edismax query parser provided by Solr. I had a similar type of requirement in our project in here is how we addressed it 1. Wrote a customized query parser similar to edismax. 2. Identified the method in the code which takes care of PhraseQuery and replaced it with a snippet of SpanNearQuery code. Please check more on SpanNearQuery if that works for you. On Thu, Nov 3, 2011 at 2:11 PM, LT.thomas t.latu...@itspree.pl wrote: Hi, By ordered I mean term1 will always come before term2 in the document. I have two documents: 1. By ordered I mean term1 will always come before term2 in the document 2. By ordered I mean term2 will always come before term1 in the document if I make the query: term1 term2~Integer.MAX_VALUE my results is: 2 documents How can I query to have one result (only if term1 come before term2): By ordered I mean term1 will always come before term2 in the document Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Ordered-proximity-search-tp3477946p3477946.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Issue with Shard configuration in solrconfig.xml (Solr 3.1)
Hi, I am trying to evaluate distributed search for my project by splitting up our single index on 2 shards with Solr 3.1 When I query the first solr server by passing the shards parameter, I get correct search results from both shards. ( http://server1:8080/solr/test/select/?shards=server1:8080/solr/test,server2:8080/solr/testq=solrstart=0rows=20 ) I want to avoid the use of this shards parameter in the http url and specify it in solrconfig.xml as follows. requestHandler name=my_custom_handler class=solr.SearchHandler default=true str name=shardsserver1:8080/solr/test,server2:8080/solr/test/str .. /requestHandler After adding the shards parameter in solrconfig.xml, I get search results only from the first shard and not from the from the second one. Am I missing any configuration ? Also, can the urls with the shard parameter be load balanced for a failover mechanism ? -- Thanks and Regards Rahul A. Warawdekar
Re: Trouble configuring multicore / accessing admin page
Hi Joshua, Can you try updating your solr.xml as follows: Specify core name=core0 instanceDir=/core0 / instead of core name=core0 instanceDir=cores/core0 / Basically remove the extra text cores in the core element from the instanceDir attribute. Just try and let us know if it works. On Wed, Sep 28, 2011 at 3:40 PM, Joshua Miller jos...@itsecureadmin.comwrote: Hello, I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? Environment: - solr 1.4.1 - Windows server 2008 R2 - Java SE 1.6u27 - Tomcat 6.0.33 - Solr Experience: none I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the following contents: solr persistent=true sharedLib=lib cores adminPath=/admij/cores core name=core0 instanceDir=cores/core0 / core name=core1 instanceDir=cores/core1 / /cores /solr I have copied the example/solr directory to c:\solr and have populated that directory with the cores/{core{0,1}} as well as the proper configs and data directories within. When I restart tomcat, it shows a couple of exceptions related to queryElevationComponent and null pointers that I think are due to the DB not yet being available but I see that the cores appear to initialize properly other than that So the problem I'm looking to solve/clarify here is the admin page - should that remain available and usable when using the multicore configuration or am I doing something wrong? Do I need to use the CoreAdminHandler type requests to manage multicore instead? Thanks, -- Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/ -- Thanks and Regards Rahul A. Warawdekar
Re: Solr stopword problem in Query
Hi Isan, The schema.xml seems OK to me. Is textForQuery the only field you are searching in ? Are you also searching on any other non text based fields ? If yes, please provide schema description for those fields also. Also, provide your solrconfig.xml file. On Tue, Sep 27, 2011 at 1:12 AM, Isan Fulia isan.fu...@germinait.comwrote: Hi Rahul, I also tried searching Coke Studio MTV but no documents were returned. Here is the snippet of my schema file. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType *field name=content type=text indexed=false stored=true multiValued=false/ field name=title type=text indexed=false stored=true multiValued=false/ **field name=textForQuery type=text indexed=true stored=false multiValued=true omitTermFreqAndPositions=true/** copyField source=content dest=textForQuery/ copyField source=title dest=textForQuery/* Thanks, Isan Fulia. On 26 September 2011 21:19, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi Isan, Does your search return any documents when you remove the 'at' keyword and just search for Coke studio MTV ? Also, can you please provide the snippet of schema.xml file where you have mentioned this field name and its type description ? On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia isan.fu...@germinait.com wrote: Hi all, I have a text field named* textForQuery* . Following content has been indexed into solr in field textForQuery *Coke Studio at MTV* when i fired the query as *textForQuery:(coke studio at mtv)* the results showed 0 documents After runing the same query in debugMode i got the following results result name=response numFound=0 start=0/ lst name=debug str name=rawquerystringtextForQuery:(coke studio at mtv)/str str name=querystringtextForQuery:(coke studio at mtv)/str str name=parsedqueryPhraseQuery(textForQuery:coke studio ? mtv)/str str name=parsedquery_toStringtextForQuery:coke studio *? *mtv/str Why the query did not matched any document even when there is a document with value of textForQuery as *Coke Studio at MTV*? Is this because of the stopword *at* present in stopwordList? -- Thanks Regards, Isan Fulia. -- Thanks and Regards Rahul A. Warawdekar -- Thanks Regards, Isan Fulia. -- Thanks and Regards Rahul A. Warawdekar
Re: Solr stopword problem in Query
Hi Isan, Does your search return any documents when you remove the 'at' keyword and just search for Coke studio MTV ? Also, can you please provide the snippet of schema.xml file where you have mentioned this field name and its type description ? On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia isan.fu...@germinait.comwrote: Hi all, I have a text field named* textForQuery* . Following content has been indexed into solr in field textForQuery *Coke Studio at MTV* when i fired the query as *textForQuery:(coke studio at mtv)* the results showed 0 documents After runing the same query in debugMode i got the following results result name=response numFound=0 start=0/ lst name=debug str name=rawquerystringtextForQuery:(coke studio at mtv)/str str name=querystringtextForQuery:(coke studio at mtv)/str str name=parsedqueryPhraseQuery(textForQuery:coke studio ? mtv)/str str name=parsedquery_toStringtextForQuery:coke studio *? *mtv/str Why the query did not matched any document even when there is a document with value of textForQuery as *Coke Studio at MTV*? Is this because of the stopword *at* present in stopwordList? -- Thanks Regards, Isan Fulia. -- Thanks and Regards Rahul A. Warawdekar
Re: JdbcDataSource and threads
I am using Solr 3.1. But you can surely try the patch with 3.3. On Fri, Sep 23, 2011 at 1:35 PM, Vazquez, Maria (STM) maria.vazq...@dexone.com wrote: Thanks Rahul. Are you using 3.3 or 3.4? I'm on 3.3 right now I will try the patch today Thanks again, Maria -Original Message- From: Rahul Warawdekar [mailto:rahul.warawde...@gmail.com] Sent: Thursday, September 22, 2011 12:46 PM To: solr-user@lucene.apache.org Subject: Re: JdbcDataSource and threads Hi, Have you applied the patch that is provided with the Jira you mentioned ? https://issues.apache.org/jira/browse/SOLR-2233 Please apply the patch and check if you are getting the same exceptions. It has worked well for me till now. On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) maria.vazq...@dexone.com wrote: Hi! So as of 3.4 JdbcDataSource doesn't work with threads, correct? https://issues.apache.org/jira/browse/SOLR-2233 I'm using Microsoft SQL Server, my data-config.xml has a lot of very complex SQL queries and it takes a long time to index. I'm migrating from Lucene to Solr and the Lucene code uses threads so it takes little time to index, now in Solr if I add threads=xx to my rootEntity I get lots of errors about connections being closed. Thanks a lot, Maria -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar
Re: How to get the fields that match the request?
Hi, Before considering highlighting to address this requirement, you also need to consider the performance implications of highlighting for large text fields. On Thu, Sep 22, 2011 at 11:42 AM, Nicolas Martin nmar...@doyousoft.comwrote: yes, highlights can help to do that, but if you wants to paginate your results, you can't use hl. It'd be great to have a scoring average by fields... On 22/09/2011 17:37, Tanner Postert wrote: this would be useful to me as well. even when searching with q=test, I know it defaults to the default search field, but it would helpful to know what field(s) match the query term. On Thu, Sep 22, 2011 at 3:29 AM, Nicolas Martinnmar...@doyousoft.com** wrote: Hi everyBody, I need your help to get more information in my solR query's response. i've got a simple input text which allows me to query several fields in the same query. So my query looks like this q=email:martyn+OR+name:martynn+OR+commercial:martyn ... Is it possible in the response to know the fields where martynn has been found ? Thanks a Lot :-) -- Thanks and Regards Rahul A. Warawdekar
Re: JdbcDataSource and threads
Hi, Have you applied the patch that is provided with the Jira you mentioned ? https://issues.apache.org/jira/browse/SOLR-2233 Please apply the patch and check if you are getting the same exceptions. It has worked well for me till now. On Thu, Sep 22, 2011 at 3:17 PM, Vazquez, Maria (STM) maria.vazq...@dexone.com wrote: Hi! So as of 3.4 JdbcDataSource doesn't work with threads, correct? https://issues.apache.org/jira/browse/SOLR-2233 I'm using Microsoft SQL Server, my data-config.xml has a lot of very complex SQL queries and it takes a long time to index. I'm migrating from Lucene to Solr and the Lucene code uses threads so it takes little time to index, now in Solr if I add threads=xx to my rootEntity I get lots of errors about connections being closed. Thanks a lot, Maria -- Thanks and Regards Rahul A. Warawdekar
Re: DIH delta last_index_time
Hi Maria/Gora, I see this as more of a problem with the timezones in which the Solr server and the database server are located. Is this true ? If yes, one more possibility of handling this scenario would be to customize DataImportHandler code as follows 1. Add one more configuration property named dbTimeZone at the entity level in data-config.xml file 2. While saving the lastIndexTime in the properties file, save it according to the timezone specified in the config so that it is in sync with the database server time. Basically customize the code so that all the time related updates to the dataimport.properties file should be timezone specific. On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote: On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez maria.vazq...@dexone.com wrote: Hi, How do you handle the situation where the time on the server running Solr doesn¹t match the time in the database? Firstly, why is that the case? NTP is pretty universal these days. I¹m using the last_index_time saved by Solr in the delta query checking it against lastModifiedDate field in the database but the times are not in sync so I might lose some changes. Can we use something else other than last_index_time? Maybe something like last_pk or something. One possible way is to edit dataimport.properties, manually or through a script, to put the last_index_time back to a safe value. Regards, Gora -- Thanks and Regards Rahul A. Warawdekar
Re: Index not getting refreshed
Hi Pawan, Can you please share more details on the indexing mechanism ? (DIH, SolrJ or any other) Please let us know the configuration details. On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira pawan.dar...@gmail.comwrote: Hi I am using Solr 3.2 on a live website. i get live user's data of about 2000 per day. I do an incremental index every 8 hours. but my search results always show the same result with same sorting order. when i check the same search from corresponding db, it gives me different results always (as new data regularly gets added) please suggest what might be the issue. is there any cache related problem at SOLR level thanks pawan -- Thanks and Regards Rahul A. Warawdekar
Re: FastVectorHighlighter with wildcard queries
Hi Koji, Thanks for the information ! I will try the patches provided by you. On 9/8/11, Koji Sekiguchi k...@r.email.ne.jp wrote: (11/09/09 6:16), Rahul Warawdekar wrote: Hi, I am currently evaluating the FastVectorHighlighter in a Solr search based project and have a couple of questions 1. Is there any specific reason why the FastVectorHighlighter does not provide support for multiterm(wildcard) queries ? 2. What are the other constraints when using FastVectorHighlighter ? FVH used to have typical constrains: 1. supports only TermQuery and PhraseQuery (and BooleanQuery/DisjunctionMaxQuery that include TQ and PQ) 2. ignores word boundary But now for 1, FVH will support other queries: https://issues.apache.org/jira/browse/LUCENE-1889 I believe it is almost closed to be fixed. For 2, FVH in the latest trunk/3x, pays regard to word or sentence boundary through BoundaryScanner: https://issues.apache.org/jira/browse/LUCENE-1824 koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/ -- Thanks and Regards Rahul A. Warawdekar
Solr: Return field names that contain search term
Hi, I have a a query on Solr search as follows. I am indexing an entity which includes a multivalued field using DIH. This multivalued field contains content from multiple attachments for a single entity. Now, for eg. if i search for the term solr, will I be able to know which field contains this search term ? And if it is a multivaued field, which field number in that multivalued field contains the search term ? Currently, to achieve this, I am using a workaround using the highlighting feature. I am indexing all the multiple attachments within a single entity and document as dynamic fields attachment_id_i. While searching, I am highlighting on these dynamic fields (hl.fl=*_i) and from the highlighitng section in the results, I am able to get the attachment number which contains the search term. But since this approach involves highlighting large attachments, the search response times are very slow. Would highly appreciate if someone can suggest other efficient ways to address this kind of a requirement. -- Thanks and Regards Rahul A. Warawdekar
Re: Solr: Return field names that contain search term
Thanks Chris ! Will try out the second approach you suggested and share my findings. On Mon, Sep 12, 2011 at 5:03 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Would highly appreciate if someone can suggest other efficient ways to : address this kind of a requirement. one approach would be to index each attachment as it's own document and search those. you could then use things like the group collapsing features to return onlly the main type documents when multiple attachments match. similarly: you could still index each main document with a giant text field containing all of the attachment text, *and* you could indx each attachment as it's own document. You would search on the main docs as you do now, but then your app could issue a secondary request searching for all attachment docs that match on one of the main docIds in a special field, and use the results to note which attachment of each doc (if any) caused the match. -Hoss -- Thanks and Regards Rahul A. Warawdekar
FastVectorHighlighter with wildcard queries
Hi, I am currently evaluating the FastVectorHighlighter in a Solr search based project and have a couple of questions 1. Is there any specific reason why the FastVectorHighlighter does not provide support for multiterm(wildcard) queries ? 2. What are the other constraints when using FastVectorHighlighter ? -- Thanks and Regards Rahul A. Warawdekar
Re: Delta import issue
Hi Peter, Try adding the primary key attribute to the root entity 'ad' and check if delta import works. By the way, which database are you using ? On Tue, Jul 12, 2011 at 10:27 AM, PeterKerk vettepa...@hotmail.com wrote: I'm having an issue with a delta import. I have the following in my data-config.xml: document name=ads entity name=ad query=select * from ads WHERE approvedate '1/1/1900' and publishdate getdate() AND depublishdate getdate() and deletedate = '1/1/1900' deltaImportQuery=select * from ads WHERE approvedate '1/1/1900' and publishdate getdate() AND depublishdate getdate() and deletedate = '1/1/1900' and id='${dataimporter.delta.id}' deltaQuery=select id from ads where updatedate '${dataimporter.last_index_time}' entity name=photo query=select locpath as locpath FROM ad_photos where adid=${ad.id} deltaImportQuery=select locpath as locpath FROM ad_photos where adid='${dataimporter.delta.id}' deltaQuery=select locpath as locpath FROM ad_photos where createdate '${dataimporter.last_index_time}' field name=photos column=locpath / /entity /entity /document Now, when I add a new photo to the ad_photos table, its not index when I perform a delta import like so: http://localhost:8983/solr/i2m/dataimport?command=delta-import. When I do a FULL import I do see the new images. Here's the definition of ad_photos table: CREATE TABLE [dbo].[ad_photos]( [id] [int] IDENTITY(1,1) NOT NULL, [adid] [int] NOT NULL, [locpath] [nvarchar](150) NOT NULL, [title] [nvarchar](50) NULL, [createdate] [datetime] NOT NULL, CONSTRAINT [PK_ad_photos] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO What am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Delta-import-issue-tp3162581p3162581.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: Delta import issue
entity *pk=id* name=ad . On Tue, Jul 12, 2011 at 11:34 AM, PeterKerk vettepa...@hotmail.com wrote: Hi Rahul, Not sure how I would do this Try adding the primary key attribute to the root entity 'ad'? In my entity ad I already have these fields (I left those out earlier for readability): field name=id column=ID / -- this is primary key of ads table field name=userid column=userid / field name=title column=title / Is that what you mean? And I'm using MSSQL2008 Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Delta-import-issue-tp3162581p3162809.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Solr Multithreading
Hi, I am currently working on a search based project which involves indexing data from a SQL Server database including attachments using DIH. For indexing attachments (varbinary DB objects), I am using TikaEntityProcessor. I am trying to use the multithreading to speed up the indexing but it seems to fail when indexing attachments, even after appying a few Solr fix patches. My question is, Is the current multithreading feature stable in Solr 3.1 or it needs further enhancements ? -- Thanks and Regards Rahul A. Warawdekar
Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor
Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. The issue is the TikaEntityProcessor is hung without throwing any exception which in tuen causes the indexing to be hung on the server. Has anyone faced a similar kind of issue in the past with TikaEntityProcessor ? Also, does someone know of a way to just skip this type of behaviour for that file and move to the next document to be indexed ? -- Thanks and Regards Rahul A. Warawdekar
Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor
Hi Markus, It is Tika. I tried using tika standalone. On 5/26/11, Markus Jelsma markus.jel...@openindex.io wrote: Can you rule out Tika or Solr by trying to parse the file with a stand-alone Tika? Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. The issue is the TikaEntityProcessor is hung without throwing any exception which in tuen causes the indexing to be hung on the server. Has anyone faced a similar kind of issue in the past with TikaEntityProcessor ? Also, does someone know of a way to just skip this type of behaviour for that file and move to the next document to be indexed ? -- Thanks and Regards Rahul A. Warawdekar
Re: 2 index within the same Solr server ?
Please refer http://wiki.apache.org/solr/MultipleIndexes On 3/29/11, Amel Fraisse amel.frai...@gmail.com wrote: Hello every body, Is it possible to create 2 index within the same Solr server ? Thank you. Amel. -- Thanks and Regards Rahul A. Warawdekar
Query regarding search term count in Solr
Hi All, This is Rahul and am using Solr for one of my upcoming projects. I had a query regarding search term count using Solr. We have a requirement in one of our search based projects to search the results based on search term counts per document. For eg, if a user searches for something like solr[4:9], this query should return only documents in which solr appears between 4 and 9 times (inclusively). if a user searches for something like solr lucene[4:9], this query should return only documents in which the phrase solr lucene appears between 4 and 9 times (inclusively). Is there any way from Solr to return results based on the search term and phrase counts ? If not, can it be customized by extending existing Solr/Lucene libraries ? -- Thanks and Regards Rahul A. Warawdekar