Re: solr reporting tool adapter
On Tue, Oct 6, 2009 at 1:09 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, i wanted to query solr and send the output some reporting tool. has anyone done something like that? moreover, which reporting filter is good?? ny suggesstions? Can you be more specific on what you want to achieve? What kind of reports are you looking for? -- Regards, Shalin Shekhar Mangar.
Re: solr optimize - no space left on device
Not sure but a quick search turned up: http://www.walkernews.net/2007/07/13/df-and-du-command-show-different-used-disk-space/ Using upto 2x the index size can happen. Also check if there is a snapshooter script running through cron which is making hard links to files while a merge is in progress. Do let us know if you make any progress. This is interesting. On Tue, Oct 6, 2009 at 5:28 PM, Phillip Farber pfar...@umich.edu wrote: I am attempting to optimize a large shard on solr 1.4 and repeatedly get java.io.IOException: No space left on device. The shard, after a final commit before optimize, shows a size of about 192GB on a 400GB volume. I have successfully optimized 2 other shards that were similarly large without this problem on identical hardware boxes. Before the optimize I see: % df -B1 . Filesystem 1B-blocks Used Available Use% Mounted on /dev/mapper/internal-solr--build--2 435440427008 205681356800 225335255040 48% /l/solrs/build-2 slurm-4:/l/solrs/build-2/data/index % du -B1 205441486848 . There's a slight discrepancy between the du and df which appears to be orphaned inodes. But the du says there should be enough space to handle the doubling in size during optimization. However, for the second time we run out of space and the du and df are wildly different at that point and the volume is at 100% % df -B1 . Filesystem 1B-blocks Used Available Use% Mounted on /dev/mapper/internal-solr--build--2 435440427008 430985760768 30851072 100% /l/solrs/build-2 slurm-4:/l/solrs/build-2/data/index % du -B1 252552298496. At this point it appears orphaned inodes are consuming space and not being freed-up. Any clue as to whether this is a lucene bug a solr bug or some other problem. Error traces follow. Thanks! Phil --- Oct 6, 2009 2:12:37 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 9110523 Oct 6, 2009 2:12:37 AM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: background merge hit exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2737) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2658) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:466) at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:719) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.seek(BufferedIndexOutput.java:124) at
datadir configuration
hello As I try to deploy my app on a tomcat server, I'd like to custome datadir variable outside the solrconfig.xml file. Is there a way to custom it in a context file? Thanks -- View this message in context: http://www.nabble.com/datadir-configuration-tp25782469p25782469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: datadir configuration
Hi, add JAVA_OPTS variable in TOMCAT_HOME/bin/catalina.sh like below, JAVA_OPTS=$JAVA_OPTS -Dsolr.home=/opt/solr -Dsolr.foo.data.dir=/opt/solr/data solr.data.dir must mapping to dataDir in solrconfig.xml here is example (solrconfig.xml): dataDir${solr.foo.data.dir:/default/path/to/datadir}/dataDir On Wed, Oct 7, 2009 at 4:27 PM, clico cl...@mairie-marseille.fr wrote: hello As I try to deploy my app on a tomcat server, I'd like to custome datadir variable outside the solrconfig.xml file. Is there a way to custom it in a context file? Thanks -- View this message in context: http://www.nabble.com/datadir-configuration-tp25782469p25782469.html Sent from the Solr - User mailing list archive at Nabble.com.
Doing SpellCheck in distributed search
Hi All, I am trying to get spell check suggestions in my distributed search query using shards. I have 2 cores configured core0 and core1 both having spell check component configured. On requesting search result using the following query I don't get the spelling suggestions. http://localhost:8080/solr/core0/select?spellcheck=trueq=BrekFastshards=localhost:8080/solr/core0,localhost:8080/solr/core1 But I could able to get suggestions when i query single core using the url given below http://localhost:8080/solr/core0/select?spellcheck=trueq=BrekFast On debugging the code (Solr 1.3) I can see suggestions coming from core0, but while merging the result the suggestion value is getting lost. I am not sure is it a bug in the code or its an enhancement to future release. Could anyone guide me on how to acheive spellcheck over multiple cores? Thanks! -- View this message in context: http://www.nabble.com/Doing-SpellCheck-in-distributed-search-tp25782755p25782755.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ISOLatin1AccentFilter before or after Snowball?
On Tue, Oct 6, 2009 at 4:33 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi all, from reading through previous posts on that subject, it seems like the accent filter has to come before the snowball filter. I'd just like to make sure this is so. If it is the case, I'm wondering whether snowball filters for i.e. French process accented language correctly, at all, or whether they remove accents anyway... Or whether accents should be removed whenever making use of snowball filters. I'd think so but I'm not sure. Perhaps someone else can weigh in. And also: it really is meant to take UTF-8 as input, even though it is named ISOLatin1AccentFilter, isn't it? See http://markmail.org/message/hi25u5iqusfu542b -- Regards, Shalin Shekhar Mangar.
Re: Questions about synonyms and highlighting
I'm not an expert on hit highlighting but please find some answers inline: On Wed, Sep 30, 2009 at 9:03 PM, Nourredine K. nourredin...@yahoo.comwrote: Hi, Can you please give me some answers for those questions : 1 - How can I get synonyms found for a keyword ? I mean i search foo and i have in my synonyms.txt file the following tokens : foo, foobar, fee (with expand = true) My index contains foo and foobar. I want to display a message in a result page, on the header for example, only the 2 matched tokens and not fee like Results found for foo and foobar Whatever token is available in the index, will be matched but I don't think it is possible to show only those synonyms which matched some documents. Adding debugQuery=on can give you some more information like how the score for a particular document was calculated for the given query. 2 - Can solR make analysis on an index to extract associations between tokens ? for example , if foo often appears with fee in a field, it will associate the 2 tokens. Solr won't compute associations but there are ways of achieving something similar. For example, the MoreLikeThis functionality clusters related documents through co-occurrence of terms in a given field. Also, the TermVectorComponent can give you position information for terms in a document. You can use that to build your own co-occurrence associations. If you just want to query for two words within a fixed position difference, you can do proximity matches. http://lucene.apache.org/java/2_9_0/queryparsersyntax.html#Proximity%20Searches Perhaps somebody else can weigh on your question #3 and #4. -- Regards, Shalin Shekhar Mangar.
Re: solr reporting tool adapter
we basically wanna generate PDF reports which contain, tag clouds, bar charts, pie charts etc. Regards, Raakhi On Wed, Oct 7, 2009 at 1:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Oct 6, 2009 at 1:09 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, i wanted to query solr and send the output some reporting tool. has anyone done something like that? moreover, which reporting filter is good?? ny suggesstions? Can you be more specific on what you want to achieve? What kind of reports are you looking for? -- Regards, Shalin Shekhar Mangar.
Re: datadir configuration
What do I put in dataDir${solr.foo.data.dir:/default/path/to/datadir}/dataDir ? What is /default/path/to/datadir? Gasol Wu wrote: Hi, add JAVA_OPTS variable in TOMCAT_HOME/bin/catalina.sh like below, JAVA_OPTS=$JAVA_OPTS -Dsolr.home=/opt/solr -Dsolr.foo.data.dir=/opt/solr/data solr.data.dir must mapping to dataDir in solrconfig.xml here is example (solrconfig.xml): dataDir${solr.foo.data.dir:/default/path/to/datadir}/dataDir On Wed, Oct 7, 2009 at 4:27 PM, clico cl...@mairie-marseille.fr wrote: hello As I try to deploy my app on a tomcat server, I'd like to custome datadir variable outside the solrconfig.xml file. Is there a way to custom it in a context file? Thanks -- View this message in context: http://www.nabble.com/datadir-configuration-tp25782469p25782469.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/datadir-configuration-tp25782469p25783320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Timeouts
On Wed, Oct 7, 2009 at 2:19 AM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: What does the maxCommitsToKeep(from SolrDeletionPolicy in SolrConfig.xml) parameter actually do? Increasing this value seems to have helped a little, but I'm wary of cranking it without having a better understanding of what it does. maxCommitsToKeep is the number of commit points (a point-in-time snapshot of the index) to keep from getting deleted. But deletion of commit points only happens on startup or when someone calls commit/optimize. -- Regards, Shalin Shekhar Mangar.
Re : Questions about synonyms and highlighting
I'm not an expert on hit highlighting but please find some answers inline: Thanks Shalin for your answers. It helps a lot. I post again questions #3 and #4 for the others :) 3 - Is it possible and if so How can I configure solR to set or not highlighting for tokens with diacritics ? Settings for vélo (all highlighted) == the two words emvélo/em and emvelo/em are highlighted Settings for vélo == the first word emvélo/em is highlighted but not the second : velo 4 - the same question for highlighting with lemmatisation? Settings for manage (all highlighted) == the two wordsemmanage/em and emmanagement/em are highlighted Settings for manage == the first word emmanage/em is highlighted but not the second : management Regard, Nourredine.
Re: Indexing and searching of sharded/ partitioned databases and tables
Comments inline: On Wed, Oct 7, 2009 at 2:01 PM, Jayant Kumar Gandhi jaya...@gmail.comwrote: Lets say I have 3 mysql databases each with 3 tables. Db1 : Tbl1, Tbl2, Tbl3 Db2 : Tbl1, Tbl2, Tbl3 Db3 : Tbl1, Tbl2, Tbl3 All databases have the same number of tables with same table names as shown above. All tables have exactly the same structure as well. Each table has three fields: id, name, category Since the data is distributed this way, I don't have a way to search for a particular record using 'name'. I must look for it in all the 9 tables. This is not scalable when lets say I have 20 databases each with 20 tables, meaning 400 queries needed to find a single record. Solr seemed like the solution to help. I followed the wiki tutorials: http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DIHQuickStart http://wiki.apache.org/solr/DataImportHandlerFaq The following are my config files so far: solrconfig.xml requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler dataconfig.xml (so far) dataConfig dataSource type=JdbcDataSource name=ds1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/Db1 user=user-name password=password / dataSource type=JdbcDataSource name=ds2 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/Db2 user=user-name password=password / dataSource type=JdbcDataSource name=ds3 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/Db3 user=user-name password=password / document entity name=record11 dataSource=ds1 query=select id,name,category from Tbl1/entity entity name=record12 dataSource=ds1 query=select id,name,category from Tbl2/entity entity name=record13 dataSource=ds1 query=select id,name,category from Tbl3/entity entity name=record21 dataSource=ds2 query=select id,name,category from Tbl1/entity entity name=record22 dataSource=ds2 query=select id,name,category from Tbl2/entity entity name=record23 dataSource=ds2 query=select id,name,category from Tbl3/entity entity name=record31 dataSource=ds3 query=select id,name,category from Tbl1/entity entity name=record32 dataSource=ds3 query=select id,name,category from Tbl2/entity entity name=record33 dataSource=ds3 query=select id,name,category from Tbl3/entity /document /dataConfig Doubts/ Questions: - Is this the right away to achieve indexing this data? - Is there a better way to achieve this? Imagine 20 databases with 20 tables each translates to 400 lines in the XML. This doesn't scale for something like 200 databases and 200 tables each. Will solr continue to work/ index properly if I had 4 entity rows without going out of memory? Seems OK. Your original database is sharded so I'm guessing the amount of data is quite large. The number of root entities does not matter. What matters is the total number of documents. As you go from indexing 20 database shards to 200 shards, you will likely cross a point where indexing all of them on a single Solr box is either impossible (due to the large number of documents) or very slow. Similarly, response times may also suffer. Solr supports distributed search wherein you can shard your Solr index each having a disjoint set of documents. You can continue to query Solr normally (except for providing an additional shards request parameter) and Solr will make sure it gets results from all shards, merges and returns them as if you were querying a single Solr instance. See http://wiki.apache.org/solr/DistributedSearch for more details. - I will really want that I can search thru the complete database for a 'name' and do things like 'category' filtering etc easily independent of the entity name/ datasource. For me they are all records of the same type. That is very much possible out of the box. -- Regards, Shalin Shekhar Mangar.
Re: Doing SpellCheck in distributed search
On Wed, Oct 7, 2009 at 2:14 PM, balaji.a reachbalaj...@gmail.com wrote: Hi All, I am trying to get spell check suggestions in my distributed search query using shards. SpellCheckComponent does not support distributed search yet. There is an issue open with a patch. If you decide to use, do let us know your feedback: https://issues.apache.org/jira/browse/SOLR-785 -- Regards, Shalin Shekhar Mangar.
Re: solr reporting tool adapter
On Wed, Oct 7, 2009 at 2:51 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: we basically wanna generate PDF reports which contain, tag clouds, bar charts, pie charts etc. Faceting on a field will give you top terms and frequency information which can be used to create tag clouds. What do you want to plot on a bar chart? I don't know of a reporting tool which can hook into Solr for creating such things. -- Regards, Shalin Shekhar Mangar.
Re: datadir configuration
On Wed, Oct 7, 2009 at 2:56 PM, clico cl...@mairie-marseille.fr wrote: What do I put in dataDir${solr.foo.data.dir:/default/path/to/datadir}/dataDir ? What is /default/path/to/datadir? Solr variables are written like: ${variable_name:default_value} If you are configuring the dataDir as an environment variable, you can remove the default value. -- Regards, Shalin Shekhar Mangar.
Re: Solr Quries
First, please do not cross-post messages to both solr-dev and solr-user. Solr-dev is only for development related discussions. Comments inline: On Wed, Oct 7, 2009 at 9:59 AM, Pravin Karne pravin_ka...@persistent.co.inwrote: Hi, I am new to solr. I have following queries : 1. Is solr work in distributed environment ? if yes, how to configure it? Yes, Solr works in distributed environment. See http://wiki.apache.org/solr/DistributedSearch 2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop) Not currently. There is some work going on at https://issues.apache.org/jira/browse/SOLR-1457 3. I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way? In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache ) XML is just one way. You could use also CSV. If you use, the Solrj java client with Solr 1.4 (soon to be released), it uses an efficient binary format for posting data to Solr. -- Regards, Shalin Shekhar Mangar.
SpellCheck with filter/conditions
Sorry, newbie here, figured it out. How do you get spelling suggestions on a specific resultset, filtered by a certain facet for example? On Wed, Oct 7, 2009 at 8:43 AM, R. Tan tanrihae...@gmail.com wrote: Nice. In comparison, how do you do it with faceting? Two other approaches are to use either the TermsComponent (new in Solr 1.4) or faceting. On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill jayallenh...@gmail.com wrote: Have a look at a blog I posted on how to use EdgeNGrams to build an auto-suggest tool: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ You could easily add filter queries to this approach. Ffor example, the query used in the blog could add filter queries like this: http://localhost:8983/solr/select/?q=user_query: ”i”wt=jsonfl=user_queryindent=onechoParams=nonerows=10sort=count descfq=yourField:yourQueryfq=anotherField:anotherQuery -Jay http://www.lucidimagination.com On Tue, Oct 6, 2009 at 4:40 AM, R. Tan tanrihae...@gmail.com wrote: Hello, What's the best way to get auto-suggested terms/keywords that is filtered by one or more fields? TermsComponent should have been the solution but filters are not supported. Thanks, Rihaed
Re: Re : Questions about synonyms and highlighting
4 - the same question for highlighting with lemmatisation? Settings for manage (all highlighted) == the two wordsemmanage/em and emmanagement/em are highlighted Settings for manage == the first word emmanage/em is highlighted but not the second : management There is no Lemmatisation support in Solr as of now. The only support you get is stemming. Let me understand this correctly - you basically want the searches to happen with stemmed base but want to selectively highlight the original and/or stemmed words. Right? If yes, then AFAIK, this is not possible. Search passes through your fields analyzers (tokenizers and filters). Highlighters, typically, use the same set of analyzers and the behavior will be the same as in search; this essentially means that the keywords manage, managing, management and manager are REDUCED to manage for searchers and highlighters. If this can be done, then the only place to enable your feature could be Lucene highlighter api's. Someone more knowledegable can tell you, if that is possible. I have no idea about your #3, though my idea of handling accentuation is to apply a ISOLatin1AccentFilterFactory and get rid of them altogether :) I am curious to know the answer though. Cheers Avlesh On Wed, Oct 7, 2009 at 3:17 PM, Nourredine K. nourredin...@yahoo.comwrote: I'm not an expert on hit highlighting but please find some answers inline: Thanks Shalin for your answers. It helps a lot. I post again questions #3 and #4 for the others :) 3 - Is it possible and if so How can I configure solR to set or not highlighting for tokens with diacritics ? Settings for vélo (all highlighted) == the two words emvélo/em and emvelo/em are highlighted Settings for vélo == the first word emvélo/em is highlighted but not the second : velo 4 - the same question for highlighting with lemmatisation? Settings for manage (all highlighted) == the two wordsemmanage/em and emmanagement/em are highlighted Settings for manage == the first word emmanage/em is highlighted but not the second : management Regard, Nourredine.
Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
I run solr successfully until i updated recently and dead at this line where ImportTime '${dataimporter.last_index_time}' from data-import.xml i got this error org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from newheader where ImportTime 'Wed Oct 07 20:17:05 EST 2009' Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:81) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextModifiedRowKey(EntityProcessorWrapper.java:251) at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:621) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:173) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting date and/or time from character string. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:196) at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1458) at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:733) at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:631) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4016) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1414) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:176) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:151) at com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:604) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246) ... 11 more Noble Paul നോബിള് नोब्ळ्-2 wrote: really? I don't remember that being changed. what difference do u notice? On Wed, Oct 7, 2009 at 2:30 AM, michael8 mich...@saracatech.com wrote: Just looking for confirmation from others, but it appears that the formatting of last_index_time from dataimport.properties (using DataImportHandler) is different in 1.4 vs. that in 1.3. I was troubleshooting why delta imports are no longer working for me after moving over to solr 1.4 (10/2 nighly) and noticed that format is different. Michael -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25776496.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25783768.html Sent from the Solr - User mailing list archive at Nabble.com.
ApacheCon US
Just a friendly reminder to all about Lucene ecosystem events at ApacheCon US this year. We have two days of talks on pretty much every project under Lucene (see http://lucene.apache.org/#14+August+2009+-+Lucene+at+US+ApacheCon ) plus a meetup and a two day training on Lucene and a 1 day training on Solr. The Lucene training will cover Lucene 2.9 and I'm sure Erik's Solr one will cover Solr 1.4. I also know there will be quite a few Lucene, et. al. committers at ApacheCon this year, so it should be a good year to interact and discuss your favorite projects. ApacheCon US is in Oakland (near San Francisco) the week of November 2nd. The trainings are on the 2nd and 3rd, and the main conference starts on the 4th. You can register at http://www.us.apachecon.com/c/acus2009/ Hope to see you there, Grant
Re: ISOLatin1AccentFilter before or after Snowball?
See http://markmail.org/message/hi25u5iqusfu542b Thank you for the link, Shalin! It could be worth copying that to the wiki? Cheers! Chantal I'd just like to make sure this is so. If it is the case, I'm wondering whether snowball filters for i.e. French process accented language correctly, at all, or whether they remove accents anyway... Or whether accents should be removed whenever making use of snowball filters. I'd think so but I'm not sure. Perhaps someone else can weigh in. And also: it really is meant to take UTF-8 as input, even though it is named ISOLatin1AccentFilter, isn't it? See http://markmail.org/message/hi25u5iqusfu542b -- Regards, Shalin Shekhar Mangar.
Re: Solr Quries
Hi Pravin, 1. Is solr work in distributed environment ? if yes, how to configure it? Yep. You can achieve this with Sharding. For example: Install and Configure Solr on two machines and declare any one of those as master. Insert shard parameters while you index and search your data. 2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop) Sorry. No idea. 3. I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way? In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache ) I think, XML is not the best way. I don't suggest it. If you have that 1 TB data in a database you can achieve this simply using full import command. Configure your DB details in solr-config.xml and data-config.xml and add you DB driver jar to solr lib directory. Now import the data in slices (say dept wise, or in some category wise..). In future, you can import the data from a DB or you can index the data directly using client-API with simple java beans. Hope this info helps you. Regards, Sandeep Tagore -- View this message in context: http://www.nabble.com/Solr-Quries-tp25780371p25783891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Doing SpellCheck in distributed search
Thanks Shalin! I applied your patch and deployed the war. While debugging the overridden method SpellCheckComponent.finishStage is not getting invoked by the SearchHandler. Instead its invoking the SearchComponent.finishStage method. Do I need to configure anything extra to make it work? My current configuration is as follows: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker1/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker2/str /lst /searchComponent requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Shalin Shekhar Mangar wrote: On Wed, Oct 7, 2009 at 2:14 PM, balaji.a reachbalaj...@gmail.com wrote: Hi All, I am trying to get spell check suggestions in my distributed search query using shards. SpellCheckComponent does not support distributed search yet. There is an issue open with a patch. If you decide to use, do let us know your feedback: https://issues.apache.org/jira/browse/SOLR-785 -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Doing-SpellCheck-in-distributed-search-tp25782755p25783896.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
On Wed, Oct 7, 2009 at 3:53 PM, Mint Ekalak mint@gmail.com wrote: I run solr successfully until i updated recently and dead at this line where ImportTime '${dataimporter.last_index_time}' from data-import.xml i got this error org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from newheader where ImportTime 'Wed Oct 07 Thanks for reporting the error. This seems to be a bug. I've opened an issue: https://issues.apache.org/jira/browse/SOLR-1496 -- Regards, Shalin Shekhar Mangar.
Re: Indexing and searching of sharded/ partitioned databases and tables
Hi Jayant, You can use Solr to achieve your objective. The data-config.xml which you posted is incomplete. I would like to suggest you a way to index the full data. Try to index a database at a time. Sample xml conf. dataSource type=JdbcDataSource name=ds1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/Db1 user=user-name password=password / document name=Tbl1 entity name=Tbl1 query=select id,name,category from Tbl1 field column=id name=id / field column=name name=name / field column=category name=category / /entity/document document name=Tbl2 entity name=Tbl2 query=select id,name,category from Tbl2 field column=id name=id / field column=name name=name / field column=category name=category / /entity/document document name=Tbl3 entity name=Tbl3 query=select id,name,category from Tbl3 field column=id name=id / field column=name name=name / field column=category name=category / /entity/document You can write an automated program which will change the DB conf details in that xml and fire the full import command. You can use http://localhost:8983/solr/dataimport url to check the status of the data import. But be careful while declaring the uniqueKey field. Make sure that you are not overwriting the records. And if you are working on large data sets, you can use Solr Sharding concept. Let us know if you have any issues. Regards, Sandeep Tagore -- View this message in context: http://www.nabble.com/Indexing-and-searching-of-sharded--partitioned-databases-and-tables-tp25782544p25783916.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Doing SpellCheck in distributed search
Sorry! it was my mistake of not copying the war at correct location. balaji.a wrote: Thanks Shalin! I applied your patch and deployed the war. While debugging the overridden method SpellCheckComponent.finishStage is not getting invoked by the SearchHandler. Instead its invoking the SearchComponent.finishStage method. Do I need to configure anything extra to make it work? My current configuration is as follows: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker1/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker2/str /lst /searchComponent requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- !-- omp = Only More Popular -- str name=spellcheck.onlyMorePopularfalse/str !-- exr = Extended Results -- str name=spellcheck.extendedResultsfalse/str !-- The number of suggestions to return -- str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Shalin Shekhar Mangar wrote: On Wed, Oct 7, 2009 at 2:14 PM, balaji.a reachbalaj...@gmail.com wrote: Hi All, I am trying to get spell check suggestions in my distributed search query using shards. SpellCheckComponent does not support distributed search yet. There is an issue open with a patch. If you decide to use, do let us know your feedback: https://issues.apache.org/jira/browse/SOLR-785 -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Doing-SpellCheck-in-distributed-search-tp25782755p25783922.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing and searching of sharded/ partitioned databases and tables
On Wed, Oct 7, 2009 at 5:09 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote: Hi Jayant, You can use Solr to achieve your objective. The data-config.xml which you posted is incomplete. Sandeep, the data-config that Jayant posted is not incomplete. The field declaration is not necessary if the name of the column in the database and the field name in schema.xml is the same. I would like to suggest you a way to index the full data. Try to index a database at a time. Sample xml conf. dataSource type=JdbcDataSource name=ds1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/Db1 user=user-name password=password / document name=Tbl1 entity name=Tbl1 query=select id,name,category from Tbl1 field column=id name=id / field column=name name=name / field column=category name=category / /entity/document document name=Tbl2 entity name=Tbl2 query=select id,name,category from Tbl2 field column=id name=id / field column=name name=name / field column=category name=category / /entity/document document name=Tbl3 entity name=Tbl3 query=select id,name,category from Tbl3 field column=id name=id / field column=name name=name / field column=category name=category / /entity/document You can write an automated program which will change the DB conf details in that xml and fire the full import command. You can use http://localhost:8983/solr/dataimport url to check the status of the data import. You could do that but I don't think it is required. If you do want to do this, it is possible to post the data-config.xml to /dataimport (this is how the dataimport.jsp works) But be careful while declaring the uniqueKey field. Make sure that you are not overwriting the records. Yes, good point. That is a typical problem with sharded databases with auto-increment primary key. If you do not have unique keys, you can concatenate the shard name with the value of the primary key. -- Regards, Shalin Shekhar Mangar.
Re: Indexing and searching of sharded/ partitioned databases and tables
On Wed, Oct 7, 2009 at 5:09 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote: You can write an automated program which will change the DB conf details in that xml and fire the full import command. You can use http://localhost:8983/solr/dataimport url to check the status of the data import. Also note that full-import deletes all existing documents. So if you write such a program which changes DB conf details, make sure you invoke the import command (new in Solr 1.4) to avoid deleting the other documents. -- Regards, Shalin Shekhar Mangar.
Re : Re : Questions about synonyms and highlighting
Thanks Avlesh. Now, I understand better how higtlighting works. As you've said, since it is based on the analysers, higtlighting will handle things like search. A precision about #3 and #4 examples , they are exclusives : I wanted to know how to do higtlighting with stemming OR without (not both in same time) So I think you've answered to #3 too :) All depend on your analysers. And for my case, the ISOLatin1AccentFilterFactory could do the job. Thanks again Shalin and Avlesh. Regard, Nourredine. There is no Lemmatisation support in Solr as of now. The only support you get is stemming. Let me understand this correctly - you basically want the searches to happen with stemmed base but want to selectively highlight the original and/or stemmed words. Right? If yes, then AFAIK, this is not possible. Search passes through your fields analyzers (tokenizers and filters). Highlighters, typically, use the same set of analyzers and the behavior will be the same as in search; this essentially means that the keywords manage, managing, management and manager are REDUCED to manage for searchers and highlighters. If this can be done, then the only place to enable your feature could be Lucene highlighter api's. Someone more knowledegable can tell you, if that is possible. I have no idea about your #3, though my idea of handling accentuation is to apply a ISOLatin1AccentFilterFactory and get rid of them altogether :) I am curious to know the answer though. __ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail
Re: datadir configuration
I tried this in my context.xml It doesn't work Environment name=solr/home type=java.lang.String value=D:\workspace\solr\home override=true / Environment name=solr.data.dir type=java.lang.String value=D:workspace\solr\datas override=true / -- View this message in context: http://www.nabble.com/datadir-configuration-tp25782469p25783937.html Sent from the Solr - User mailing list archive at Nabble.com.
manage rights
Hi everybody As I'm ready to deploy my solr server (after many tests and use cases) I'd like ton configure my server in order that some request cannot be post As an example : My CMS or data app can use - dataimport - and other indexing commands My website can only perform a search on the server could one explain me where this configuration has to be done? Thanks -- View this message in context: http://www.nabble.com/manage-rights-tp25784152p25784152.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr optimize - no space left on device
All, We're puzzled why we're still unable to optimize a 192GB index on a LVM volume that has 406GB available. We are not using Solr distribution. There is no snapshooter in the picture. We run out of disk capacity with a df showing 100% but a du showing just 379GB of files. Restarting tomcat causes space to be recovered and many segments to be deleted leaving just 3 from the original 33. Issuing another optimize at that point causes solr to run for a while and then show no further activity (cpu,memory consumption) in jconsole. The 3 segments do not merge into one. % df -h . /dev/mapper/internal-solr--build--2 size usedavail 406G 402G 30M 100% /l/solrs/build-2 Also suspicious is the 406G vs. 402G vs. 30M for size vs. used vs. avail Yesterday, after our 2nd try, lsof showed several deleted files that were still open that apparently were consuming space almost 134GB. jsvc 8381tomcat 377u REG 253,6 13369098240 1982471 /l/solrs/build-2/data/index/_1j37.tis (deleted) jsvc 8381tomcat 378u REG 253,6184778752 1982472 /l/solrs/build-2/data/index/_1j37.tii (deleted) jsvc 8381tomcat 379u REG 253,6 34053685248 1982473 /l/solrs/build-2/data/index/_1j37.frq (deleted) jsvc 8381tomcat 380u REG 253,6 130411978752 1982474 /l/solrs/build-2/data/index/_1j37.prx (deleted) That theory did not work because the error log showed that solr was trying to merge into the _1j37 segment files showing as deleted in the lsof above when it ran out of space so those are a symptom not a cause of the lost space: SEVERE: java.io.IOException: background merge hit exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [optimizee]: java.io.IOException: background merge hit exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [ We restored the pre-optimized index again, restarted tomcat and tried to optimize using SerialMergePolicy instead of the default ConcurrentMergePolicy under the theory that concurrent merges could somehow take more than 2X disk space. The optimize failed again with out of space error. This time there where no deleted files in the lsof output. This is one shard out of 10. A couple the shards were around 192GB and merged successfully. Any suggestions on how to debug this would be greatly appreciated. Thanks! Phil hathitrust.org University of Michigan Shalin Shekhar Mangar wrote: Not sure but a quick search turned up: http://www.walkernews.net/2007/07/13/df-and-du-command-show-different-used-disk-space/ Using upto 2x the index size can happen. Also check if there is a snapshooter script running through cron which is making hard links to files while a merge is in progress. Do let us know if you make any progress. This is interesting. On Tue, Oct 6, 2009 at 5:28 PM, Phillip Farber pfar...@umich.edu wrote: I am attempting to optimize a large shard on solr 1.4 and repeatedly get java.io.IOException: No space left on device. The shard, after a final commit before optimize, shows a size of about 192GB on a 400GB volume. I have successfully optimized 2 other shards that were similarly large without this problem on identical hardware boxes. Before the optimize I see: % df -B1 . Filesystem 1B-blocks Used Available Use% Mounted on /dev/mapper/internal-solr--build--2 435440427008 205681356800 225335255040 48% /l/solrs/build-2 slurm-4:/l/solrs/build-2/data/index % du -B1 205441486848 . There's a slight discrepancy between the du and df which appears to be orphaned inodes. But the du says there should be enough space to handle the doubling in size during optimization. However, for the second time we run out of space and the du and df are wildly different at that point and the volume is at 100% % df -B1 . Filesystem 1B-blocks Used Available Use% Mounted on /dev/mapper/internal-solr--build--2 435440427008 430985760768 30851072 100% /l/solrs/build-2 slurm-4:/l/solrs/build-2/data/index % du -B1 252552298496. At this point it appears orphaned inodes are consuming space and not being freed-up. Any clue as to whether this is a lucene bug a solr bug or some other problem. Error traces follow. Thanks! Phil --- Oct 6, 2009 2:12:37 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 9110523 Oct 6, 2009 2:12:37 AM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: background merge hit exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2737) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2658) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401) at
Re: Problems with DIH XPath flatten
Here's a sample: ?xml version=1.0 encoding=ISO-8859-1? !DOCTYPE document [ !ENTITY nbsp #160; !ENTITY copy #169; !ENTITY reg #174; ] document kbml version=-//Indiana University//DTD KBML 0.9//EN kbqIn Mac OS X, how do I enable or disable the firewall?/kbq body pkbh docid=aghe access=allowedMac OS Xdomainall/domainvisibilityvisible/visibility/kbh includes an easy-to-use kbh docid=aoru access=allowedfirewalldomainall/domainvisibilityvisible/visibility/kbh that can prevent potentially harmful incoming connections from other computers. To turn it on or off:/p h3Mac OS X 10.6 (Snow Leopard)/h3 olliFrom the Apple menu, select miSystem Preferences...†/mi. When the codeSystem Preferences/code window appears, from the miView/mi menu, select miSecurity/mi. br clear=none/br clear=none/ /liliClick the miFirewall/mi tab. ... /li/ol /body xtra term weight=0macos/term term weight=0macintosh/term term weight=0apple/term term weight=0macosx/term ... /xtra /kbml metadata docidaozg/docid owner firstname= lastname=Macintosh Supportscmac/owner ... /metadata /document The /document/kbml/kbq works fine, but as you can see, it has no children. The actual content of the document is within the body element, though, which requires some flattening. Thanks for your time, Adam 2009/10/6 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: send a small sample xml snippet you are trying to index and it may help On Tue, Oct 6, 2009 at 9:29 PM, Adam Foltzer acfolt...@gmail.com wrote: Hi all, I'm trying to set up DataImportHandler to index some XML documents available over web services. The XML includes both content and metadata, so for the indexable content, I'm trying to just index everything under the content tag: entity dataSource=kbws name=kbxml pk=title url=resturl processor=XPathEntityProcessor forEach=/document transformer=HTMLStripTransformer flatten=true field column=content name=content xpath=/document/kbml/body flatten=true stripHTML=true / field column=title name=title xpath=/document/kbml/kbq / /entity The result of this is that the title field gets populated and indexed (there are no child nodes of /document/kbml/kbq), but content does not get indexed at all. Since /document/kbml/body has many children, I expected that flatten=true would store all of the body text in the field. Instead, it stores nothing at all. I've tried this with many combinations of transformers and flatten options, and the result is the same each time. Here are the relevant field declarations from the schema (the type=text is just the one from the example's schema.xml). I have tried combinations here as well of stored= and multiValued=, with the same result each time. field name=title type=text indexed=true stored=true multiValued=true / field name=content type=text indexed=true stored=true multiValued=true / If it would help troubleshooting, I could send along some sample XML. I don't want to spam the list with an attachment unless it's necessary, though :) Thanks in advance for your help, Adam Foltzer -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr Trunk Heap Space Issues
Here is what I discovered after dozens of reindexes. We have a tool that is pulling all of the documents' uniqueIds. This tools is causing the cache to fill up. We turned it off and the system was able to reindex. Here is what is still puzzling to me about this entire scenario. When we had only 1 core active I was able to reindex the core even with the tool filling up the document cache. As soon as I added a second empty core the OOM stuff started. Could this be caused by the second core allowing the document cache to leak into it? Just seems strange that a second empty core allows the system to run out of heap. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Mark Miller markrmil...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Tue, 06 Oct 2009 17:21:47 -0400 To: solr-user@lucene.apache.org Subject: Re: Solr Trunk Heap Space Issues Mark Miller wrote: Jeff Newburn wrote: So could that potentially explain our use of more ram on indexing? Or is this a rare edge case. I think it could explain the JVM using more RAM while indexing - but it should be fairly easily recoverable from what I can tell - so no explanation on the OOM yet. Still looking at that one. Is you system basically stock, or do you have custom plugins in it? No matter what I try with however many cores, I can't duplicate your problem. -- - Mark http://www.lucidimagination.com
Re: Seattle / PNW Hadoop/Lucene/HBase Meetup, Wed Sep 30th
Hey PNW Clouders! I'd really like to chat further with the crew doing distributed Solr. Give me a ring or shoot me an email, let's do lunch! -Nick On Wed, Sep 30, 2009 at 2:10 PM, Nick Dimiduk ndimi...@gmail.com wrote: As Bradford is out of town this evening, I will take up the mantel of Person-on-Point. Contact me with questions re: tonight's gathering. See you tonight! -Nick 614.657.0267 On Mon, Sep 28, 2009 at 4:33 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Hello everyone! Don't forget that the Meetup is THIS Wednesday! I'm looking forward to hearing about Hive from the Facebook team ... and there might be a few other interesting talks as well. Here's the details in the wiki: http://wiki.apache.org/hadoop/PNW_Hadoop_%2B_Apache_Cloud_Stack_User_Group Cheers, Bradford On Mon, Sep 14, 2009 at 11:35 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, It's time for another Hadoop/Lucene/ApacheCloud Stack meetup! This month it'll be on Wednesday, the 30th, at 6:45 pm. We should have a few interesting guests this time around -- someone from Facebook may be stopping by to talk about Hive :) We've had great attendance in the past few months, let's keep it up! I'm always amazed by the things I learn from everyone. We're back at the University of Washington, Allen Computer Science Center (not Computer Engineering) Map: http://www.washington.edu/home/maps/?CSE Room: 303 -or- the Entry level. If there are changes, signs will be posted. More Info: The meetup is about 2 hours (and there's usually food): we'll have two in-depth talks of 15-20 minutes each, and then several lightning talks of 5 minutes. If no one offers, We'll then have discussion and 'social time'. we'll just have general discussion. Let net know if you're interested in speaking or attending. We'd like to focus on education, so every presentation *needs* to ask some questions at the end. We can talk about these after the presentations, and I'll record what we've learned in a wiki and share that with the rest of us. Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com Cheers, Bradford -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science -- http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: How to retrieve the index of a string within a field?
Hi Sandeep, Say the field field name=sentenceCan you get what you want?/field, the field type is Text. My query contains 'sentence:get what you'. Is it possible to get number 2 directly from a query since the word 'get' is the 2nd token in the sentence? Thanks. Elaine On Wed, Oct 7, 2009 at 8:12 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Hi Elaine, What do you mean by index of this word.. do you want to return the first occurrence of the word in that sentence or the document id. Also which type of field is it? is it a Text or String? If that is of type Text.. u can't achieve that because the sentence will be tokenized. Sandeep Elaine Li wrote: I have a field. The field has a sentence. If the user types in a word or a phrase, how can I return the index of this word or the index of the first word of the phrase? I tried to use bf=ord..., but it does not work as i expected. -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25783936.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrieve the index of a string within a field?
Hi Elaine, You can achieve that with some modifications in sol configuration files. Generally text will be configured as fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType When a field is declared as text(with above conf.) it will tokenized. Say, for example, your sentence Can you get what you want? will become be tokenized like can, you, get, what, you, want. So when you search for 'sentence:get what you' you will get 0 results. To achieve your objective you can remove Tokenizers in text configuration. The best way I suggest is to declare the field as type string. Search the string with wild card like 'sentence:*get what you*' using sorlj client and when you get try to records (results) save the output of sentence.indexOf(keyword) in your java bean. Here sentence is a variable declared in the java bean. For more details you need to read the usage of Solrj. If you have any issues in modifying the configuration post the configuration you have for the fieldtype text and i will modify it for you. Regards, Sandeep Team Elaine Li wrote: Say the field field name=sentenceCan you get what you want?/field, the field type is Text. My query contains 'sentence:get what you'. Is it possible to get number 2 directly from a query since the word 'get' is the 2nd token in the sentence? -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25788406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why isn't the DateField implementation of ISO 8601 broader?
Chris Hostetter wrote: : I would expect field:2001-03 to be a hit on a partial match such as : field:[2001-02-28T00:00:00Z TO 2001-03-13T00:00:00Z]. I suppose that my : expectation would be that field:2001-03 would be counted once per day for each : day in its range. It would follow that a user looking for documents relating ...meanwhile someone else might expect that unless the ambiguous date must be entirely contained within the range being queried on. If implemented in DateField I guess this behaviour would need to be configurable. (your implication of counting once per day would have pretty weird results on faceting by the way) I agree. It would be possible to have one document hit on a query but have hundreds of facet categories with a count of one under this scheme. I'm leaning towards the scenario I described where the document would be counted once in an other facet category if it is relevant through rounding. with unambiguous dates, you can have exactly what you want just by being a little more verbose when indexing/quering, (and somoene else can have exactly what they want by being equally verbose using slightly differnet options/queries in your case: i would suggest that you use two fields: date_low and date_high ... when you have an exact date (down to the smallest level of granularity you care about) you put the same value in both fields, when you have an ambiguous value (like 2001-03) you put the largest value possible in date_high and the lowest value possible in date_low (ie: date_low:2001-03-01T00:00:00Z date_high:2001-03-31T23:59:59.999Z) then a query for anything *overlapping* the range from feb28 to march 13 would be... +date_low:[* TO 2001-03-13T00:00:00Z] +date_high:[2001-02-28T00:00:00Z TO *] ...it works for ambiguous dates, and it works for exact dates. (someone else who only wants to see matches if the ranges *completely* overlap would just swap which end point they queried against which field) We've had a really similar solution in place for range queries for a while. Our current problem is really faceting. Thanks, Tricia
how can I use debugQuery if I have extended QParserPlugin?
in a previous post, I asked how I would go about creating virtual function in my solr query; ie: http://127.0.0.1:8994/solr/select...@myfunc(1,2,3,4) I was trying to find a way to more easily/cleanly perform queries against large numbers of dynamic fields (ie field1, field2, field3...field99). I have extended QParserPlugin so that I can do this. the extended method replaces the virtual function section of the query with an expanded set of fields; @myFunc(1,2,3,4) can become something like (A1:1 AND B1:2 AND C1:3 AND D1:4) OR (A2:1 AND B2:2 AND C2:3 AND D2:4) OR ... (A99:1 AND B99:2 AND C99:3 AND D99:4)) one thing I noticed is that if I append debugQuery to a query that includes the virtual function, I get a NullPointerException, likely because the debugging code looks at the query passed in and not the expanded list that my code generates. I would like to be able to use debugQuery to analyse my queries, including those with the virtual function. What would I have to modify to get debugQuery to work?? thx in advance. -- View this message in context: http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25789546.html Sent from the Solr - User mailing list archive at Nabble.com.
IndexWriter InfoStream in solrconfig not working
Hello, We are trying to debug an indexing/optimizing problem and have tried setting the infoStream file in solrconf.xml so that the SolrIndexWriter will write a log file. Here is our setting: !-- To aid in advanced debugging, you may turn on IndexWriter debug logging. Uncommenting this and setting to true will set the file that the underlying Lucene IndexWriter will write its debug infostream to. -- infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream After making that change to solrconfig.xml, restarting Solr, we see a message in the tomcat logs saying that the log is enabled: build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is enabled: /tmp/LuceneIndexWriterDebug.log However, if we then run an optimize we can't see any log file being written. I also looked at the patch for http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test that I might try to run in our system. Do others have this logging working successfully ? Is there something else that needs to be set up? Tom
Re: How much disk space does optimize really take
On Wed, Oct 7, 2009 at 12:51 PM, Phillip Farber pfar...@umich.edu wrote: In a separate thread, I've detailed how an optimize is taking 2x disk space. We don't use solr distribution/snapshooter. We are using the default deletion policy = 1. We can't optimize a 192G index in 400GB of space. This thread in lucene/java-user http://www.gossamer-threads.com/lists/lucene/java-user/43475 suggests that an optimize should not take 2x unless perhaps an IndexReader is holding on to segments. This could be our problem since when optimization runs out of space, if we stop tomcat, a number of files go away and space is recovered. But we are not searching the index so how could a Searcher/IndexReader have any segments open? I notice in the logs that as part of routine commits or as part of optimize a Searcher is registered and autowarmed from a previous searcher (of course there's nothing in the caches -- this is just a build machine). INFO: registering core: Oct 6, 2009 2:16:20 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher searc...@2e097617 main Does this means that there's always a lucene IndexReader holding segment files open so they can't be deleted during an optimize so we run out of disk space 2x? Yes. A feature could probably now be developed now that avoids opening a reader until it's requested. That wasn't really possible in the past - due to many issues such as Lucene autocommit. -Yonik http://www.lucidimagination.com
Re: TermsComponent or auto-suggest with filter
Something like this, building on each character typed: facet=onfacet.field=tc_queryfacet.prefix=befacet.mincount=1 -Jay http://www.lucidimagination.com On Tue, Oct 6, 2009 at 5:43 PM, R. Tan tanrihae...@gmail.com wrote: Nice. In comparison, how do you do it with faceting? Two other approaches are to use either the TermsComponent (new in Solr 1.4) or faceting. On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill jayallenh...@gmail.com wrote: Have a look at a blog I posted on how to use EdgeNGrams to build an auto-suggest tool: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ You could easily add filter queries to this approach. Ffor example, the query used in the blog could add filter queries like this: http://localhost:8983/solr/select/?q=user_query: ”i”wt=jsonfl=user_queryindent=onechoParams=nonerows=10sort=count descfq=yourField:yourQueryfq=anotherField:anotherQuery -Jay http://www.lucidimagination.com On Tue, Oct 6, 2009 at 4:40 AM, R. Tan tanrihae...@gmail.com wrote: Hello, What's the best way to get auto-suggested terms/keywords that is filtered by one or more fields? TermsComponent should have been the solution but filters are not supported. Thanks, Rihaed
Facet query pb
Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*) : ok, retrieve docs http://server/solr//select?fq=(location_field:NORTH AMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How much disk space does optimize really take
It would be good to be able to commit without opening a new reader however with Lucene 2.9 the segment readers for all available segments are already created and available via getReader which manages the reference counting internally. Using reopen redundantly creates SRs that are already held internally in IW. On Wed, Oct 7, 2009 at 9:59 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Oct 7, 2009 at 12:51 PM, Phillip Farber pfar...@umich.edu wrote: In a separate thread, I've detailed how an optimize is taking 2x disk space. We don't use solr distribution/snapshooter. We are using the default deletion policy = 1. We can't optimize a 192G index in 400GB of space. This thread in lucene/java-user http://www.gossamer-threads.com/lists/lucene/java-user/43475 suggests that an optimize should not take 2x unless perhaps an IndexReader is holding on to segments. This could be our problem since when optimization runs out of space, if we stop tomcat, a number of files go away and space is recovered. But we are not searching the index so how could a Searcher/IndexReader have any segments open? I notice in the logs that as part of routine commits or as part of optimize a Searcher is registered and autowarmed from a previous searcher (of course there's nothing in the caches -- this is just a build machine). INFO: registering core: Oct 6, 2009 2:16:20 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher searc...@2e097617 main Does this means that there's always a lucene IndexReader holding segment files open so they can't be deleted during an optimize so we run out of disk space 2x? Yes. A feature could probably now be developed now that avoids opening a reader until it's requested. That wasn't really possible in the past - due to many issues such as Lucene autocommit. -Yonik http://www.lucidimagination.com
Re: Facet query pb
I have no idea what pb mean but this is what you probably want - fq=(location_field:(NORTH AMERICA*)) Cheers Avlesh On Wed, Oct 7, 2009 at 10:40 PM, clico cl...@mairie-marseille.fr wrote: Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29: ok, retrieve docs http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet query pb
Clico, Because you are doing a wildcard query, the token 'AMERICA' will not be analyzed at all. This means that 'AMERICA*' will NOT match 'america'. On 10/07/2009 12:30 PM, Avlesh Singh wrote: I have no idea what pb mean but this is what you probably want - fq=(location_field:(NORTH AMERICA*)) Cheers Avlesh On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr wrote: Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29: ok, retrieve docs http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How much disk space does optimize really take
On Wed, Oct 7, 2009 at 10:45 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: It would be good to be able to commit without opening a new reader however with Lucene 2.9 the segment readers for all available segments are already created and available via getReader which manages the reference counting internally. Using reopen redundantly creates SRs that are already held internally in IW. Jason, I think this is something we should consider changing. A user who is not using NRT features should not pay the price of keeping readers opened. We are also interested in opening a searcher just-in-time for SOLR-1293. We have use-cases where a SolrCore is loaded only for indexing and then unloaded. -- Regards, Shalin Shekhar Mangar.
Re: How much disk space does optimize really take
I think that argument requires auto commit to be on and opening readers after the optimize starts? Otherwise, the optimized version is not put into place until a commit is called, and a Reader won't see the newly merged segments until then - so the original index is kept around in either case - having a Reader open on it shouldn't affect the space requirements? Yonik Seeley wrote: On Wed, Oct 7, 2009 at 12:51 PM, Phillip Farber pfar...@umich.edu wrote: In a separate thread, I've detailed how an optimize is taking 2x disk space. We don't use solr distribution/snapshooter. We are using the default deletion policy = 1. We can't optimize a 192G index in 400GB of space. This thread in lucene/java-user http://www.gossamer-threads.com/lists/lucene/java-user/43475 suggests that an optimize should not take 2x unless perhaps an IndexReader is holding on to segments. This could be our problem since when optimization runs out of space, if we stop tomcat, a number of files go away and space is recovered. But we are not searching the index so how could a Searcher/IndexReader have any segments open? I notice in the logs that as part of routine commits or as part of optimize a Searcher is registered and autowarmed from a previous searcher (of course there's nothing in the caches -- this is just a build machine). INFO: registering core: Oct 6, 2009 2:16:20 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher searc...@2e097617 main Does this means that there's always a lucene IndexReader holding segment files open so they can't be deleted during an optimize so we run out of disk space 2x? Yes. A feature could probably now be developed now that avoids opening a reader until it's requested. That wasn't really possible in the past - due to many issues such as Lucene autocommit. -Yonik http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
RE: IndexWriter InfoStream in solrconfig not working
I had the same problem. I'd be very interested to know how to get this working... -Gio. -Original Message- From: Burton-West, Tom [mailto:tburt...@umich.edu] Sent: Wednesday, October 07, 2009 12:13 PM To: solr-user@lucene.apache.org Subject: IndexWriter InfoStream in solrconfig not working Hello, We are trying to debug an indexing/optimizing problem and have tried setting the infoStream file in solrconf.xml so that the SolrIndexWriter will write a log file. Here is our setting: !-- To aid in advanced debugging, you may turn on IndexWriter debug logging. Uncommenting this and setting to true will set the file that the underlying Lucene IndexWriter will write its debug infostream to. -- infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream After making that change to solrconfig.xml, restarting Solr, we see a message in the tomcat logs saying that the log is enabled: build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is enabled: /tmp/LuceneIndexWriterDebug.log However, if we then run an optimize we can't see any log file being written. I also looked at the patch for http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test that I might try to run in our system. Do others have this logging working successfully ? Is there something else that needs to be set up? Tom
Default query parameter for one core
I'd like to have 5 cores on my box. core0 should automatically shard to cores 1-4, which each have a quarter of my corpus. I tried this in my solrconfig.xml: requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=shards${solr.core.shardsParam:}/str !-- aka, if the core specifies a shardsParam, great, and if not, use nothing -- /lst /requestHandler and this in my solr.xml: cores adminPath=/admin/cores shareSchema=true core name=core0 instanceDir=./ shardsParam=localhost:9990/core1,localhost:9990/core2,localhost:9990/core3,localhost:9990/core4 / core name=core1 instanceDir=./ dataDir=/home/search/data/1/ !-- etc for cores 2 through 4 -- /cores Unfortunately, this doesn't work, because cores 1 through 4 end up specifying a blank shards param, which is different from no shards param at all -- it results in a NullPointerException. Is there a way to not have the shards param at all for most cores, and for core0 to specify it?
Re: How much disk space does optimize really take
Yonik Seeley wrote: Does this means that there's always a lucene IndexReader holding segment files open so they can't be deleted during an optimize so we run out of disk space 2x? Yes. A feature could probably now be developed now that avoids opening a reader until it's requested. That wasn't really possible in the past - due to many issues such as Lucene autocommit. So this implies that for a normal optimize, in every case, due to the Searcher holding open the existing segment prior to optimize that we'd always need 3x even in the normal case. This seems wrong since it is repeated stated that in the normal case only 2x is needed and I have successfully optimized a similar sized 192G index on identical hardware with a 400G capacity. Yonik, I'm uncertain then about what you're saying about required disk space ofr optimize. Could you clarify? -Yonik http://www.lucidimagination.com
Re: Facet query pb
Aq On 10/7/09, clico cl...@mairie-marseille.fr wrote: Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*) : ok, retrieve docs http://server/solr//select?fq=(location_field:NORTH AMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sent from my mobile device
Re: How much disk space does optimize really take
To be clear, the SRs created by merges don't have the term index loaded which is the main cost. One would need to use IndexReaderWarmer to load the term index before the new SR becomes a part of SegmentInfos. On Wed, Oct 7, 2009 at 10:34 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Oct 7, 2009 at 10:45 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: It would be good to be able to commit without opening a new reader however with Lucene 2.9 the segment readers for all available segments are already created and available via getReader which manages the reference counting internally. Using reopen redundantly creates SRs that are already held internally in IW. Jason, I think this is something we should consider changing. A user who is not using NRT features should not pay the price of keeping readers opened. We are also interested in opening a searcher just-in-time for SOLR-1293. We have use-cases where a SolrCore is loaded only for indexing and then unloaded. -- Regards, Shalin Shekhar Mangar.
Re: How to retrieve the index of a string within a field?
Sandeep, I do get results when I search for get what you, not 0 results. What in my schema makes this difference? fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ !--filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I need to learn Solrj. I am currently using javascript as a client and invoke http calls to get results to display in the browser. Can Solrj get all the results at one short w/o the http call? I need to do some postprocessing against all the results and then display the processed data. Submitting multiple http queries and post-process after each query does not seem to be the right way. Thanks. Elaine On Wed, Oct 7, 2009 at 11:06 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Hi Elaine, You can achieve that with some modifications in sol configuration files. Generally text will be configured as fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType When a field is declared as text(with above conf.) it will tokenized. Say, for example, your sentence Can you get what you want? will become be tokenized like can, you, get, what, you, want. So when you search for 'sentence:get what you' you will get 0 results. To achieve your objective you can remove Tokenizers in text configuration. The best way I suggest is to declare the field as type string. Search the string with wild card like 'sentence:*get what you*' using sorlj client and when you get try to records (results) save the output of sentence.indexOf(keyword) in your java bean. Here sentence is a variable declared in the java bean. For more details you need to read the usage of Solrj. If you have any issues in modifying the configuration post the configuration you have for the fieldtype text and i will modify it for you. Regards, Sandeep Team Elaine Li wrote: Say the field field name=sentenceCan you get what you want?/field, the field type is Text. My query contains 'sentence:get what you'. Is it possible to get number 2 directly from a query since the word 'get' is the 2nd token in the sentence? -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25788406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about PatternReplace filter and automatic Synonym generation
On 10/6/09 3:32 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I ll try to explain with an example. Given the term 'it!' in the title, it : should match both 'it' and 'it!' in the query as an exact match. Currently, : this is done by using a synonym entry (and index time SynonymFilter) as : follows: : : it! = it, it! : : Now, the above holds true for all cases where you have a title token of the : form [aA-zZ]*!. Handling all of those cases requires adding synonyms : manually for each case which is not easy to manage and does not scale. : : I am hoping to do the same by using a index time filter that takes in a : pattern like the PatternReplace filter and adds the newly created token : instead of replacing the original one. Does this make sense? Am I missing : something that would break this approach? something like this would be fairly easy to implement in Lucene, but somewhat confusing to try and configure in Solr. I was going to suggest that you use something like... filter class=solr.PatternReplaceFilterFactory pattern=(^.*)\!?$) replacement=$1 $2 replace=all / ..and then have a subsequent filter that splits the tokens on the whitespace (or any other special character you could use in the replacement) ... but aparently we don't have any built in filters that will just split tokens on a character/pattern for you. that would also be fairly easy to write if someone wnats to submit a patch. There is a Solr.PatternTokenizerFactory class which likely fits the bill in this case. The related question I have is this - is it possible to have multiple Tokenizers in your analysis chain? Prasanna.
How to determine the size of the index?
Is this info available via admin page?
Re: Facet query pb
That's not a pb I want to use that in order to drill down a tree Christian Zambrano wrote: Clico, Because you are doing a wildcard query, the token 'AMERICA' will not be analyzed at all. This means that 'AMERICA*' will NOT match 'america'. On 10/07/2009 12:30 PM, Avlesh Singh wrote: I have no idea what pb mean but this is what you probably want - fq=(location_field:(NORTH AMERICA*)) Cheers Avlesh On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr wrote: Hello I have a pb trying to retrieve a tree with facet use I 've got a field location_field Each doc in my index has a location_field Location field can be continent/country/city I have 2 queries: http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29: ok, retrieve docs http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*) : not ok I think with NORTH AMERICA I have a pb with the space caractere Could u help me -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Facet-query-pb-tp25790667p25792177.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How much disk space does optimize really take
On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber pfar...@umich.edu wrote: So this implies that for a normal optimize, in every case, due to the Searcher holding open the existing segment prior to optimize that we'd always need 3x even in the normal case. This seems wrong since it is repeated stated that in the normal case only 2x is needed and I have successfully optimized a similar sized 192G index on identical hardware with a 400G capacity. 2x for the IndexWriter only. Having an open index reader can increase that somewhat... 3x is the absolute worst case I think and that can currently be avoided by first calling commit and then calling optimize I think. This way the open reader will only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. -Yonik http://www.lucidimagination.com
Re: How much disk space does optimize really take
On Wed, Oct 7, 2009 at 1:34 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Oct 7, 2009 at 10:45 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: It would be good to be able to commit without opening a new reader however with Lucene 2.9 the segment readers for all available segments are already created and available via getReader which manages the reference counting internally. Using reopen redundantly creates SRs that are already held internally in IW. Jason, I think this is something we should consider changing. A user who is not using NRT features should not pay the price of keeping readers opened. We are also interested in opening a searcher just-in-time for SOLR-1293. We have use-cases where a SolrCore is loaded only for indexing and then unloaded. This is already true today. If you don't use NRT then the readers are not held open by Lucene. Mike
Re: How much disk space does optimize really take
Wow, this is weird. I commit before I optimize. In fact, I bounce tomcat before I optimize just in case. It makse sense, as you say, that then the open reader can only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. But we're still exceeding 2x. And after the optimize fails, if we then do a commit or bounce tomcat, a bunch of segments disappear. I am stumped. Yonik Seeley wrote: On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber pfar...@umich.edu wrote: So this implies that for a normal optimize, in every case, due to the Searcher holding open the existing segment prior to optimize that we'd always need 3x even in the normal case. This seems wrong since it is repeated stated that in the normal case only 2x is needed and I have successfully optimized a similar sized 192G index on identical hardware with a 400G capacity. 2x for the IndexWriter only. Having an open index reader can increase that somewhat... 3x is the absolute worst case I think and that can currently be avoided by first calling commit and then calling optimize I think. This way the open reader will only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. -Yonik http://www.lucidimagination.com
Re: How much disk space does optimize really take
Oops, send before finished. Partial Optimize aka maxSegments is a recent Solr 1.4/Lucene 2.9 feature. As to 2x v.s. 3x, the general wisdom is that an optimize on a simple index takes at most 2x disk space, and on a compound index takes at most 3x. Simple is the default (*). At Divvio we had the same problem and it never took up more than 2x. If your index disks are really bursting at the seams, you could try creating an empty index on a separate disk and merging your large index into that index. The resulting index will be mostly optimized. Lance Norskog * in solrconfig.xml: useCompoundFilefalse/useCompoundFile On 10/7/09, Phillip Farber pfar...@umich.edu wrote: Wow, this is weird. I commit before I optimize. In fact, I bounce tomcat before I optimize just in case. It makse sense, as you say, that then the open reader can only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. But we're still exceeding 2x. And after the optimize fails, if we then do a commit or bounce tomcat, a bunch of segments disappear. I am stumped. Yonik Seeley wrote: On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber pfar...@umich.edu wrote: So this implies that for a normal optimize, in every case, due to the Searcher holding open the existing segment prior to optimize that we'd always need 3x even in the normal case. This seems wrong since it is repeated stated that in the normal case only 2x is needed and I have successfully optimized a similar sized 192G index on identical hardware with a 400G capacity. 2x for the IndexWriter only. Having an open index reader can increase that somewhat... 3x is the absolute worst case I think and that can currently be avoided by first calling commit and then calling optimize I think. This way the open reader will only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: How much disk space does optimize really take
On Wed, Oct 7, 2009 at 3:16 PM, Phillip Farber pfar...@umich.edu wrote: Wow, this is weird. I commit before I optimize. In fact, I bounce tomcat before I optimize just in case. It makse sense, as you say, that then the open reader can only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. But we're still exceeding 2x. How much over 2x? It is possible (though relatively rare) for an optimized index to be larger than a non-optimized index. -Yonik http://www.lucidimagination.com
Re: How much disk space does optimize really take
I can't tell why calling a commit or restarting is going to help anything - or why you need more than 2x in any case. The only reason i can see this being is if you have turned on auto-commit. Otherwise the Reader is *always* only referencing what would have to be around anyway. Your likely to just too close to the edge. There are fragmentation issues and whatnot when your dealing with such large files and so little space above what you need. Phillip Farber wrote: Wow, this is weird. I commit before I optimize. In fact, I bounce tomcat before I optimize just in case. It makse sense, as you say, that then the open reader can only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. But we're still exceeding 2x. And after the optimize fails, if we then do a commit or bounce tomcat, a bunch of segments disappear. I am stumped. Yonik Seeley wrote: On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber pfar...@umich.edu wrote: So this implies that for a normal optimize, in every case, due to the Searcher holding open the existing segment prior to optimize that we'd always need 3x even in the normal case. This seems wrong since it is repeated stated that in the normal case only 2x is needed and I have successfully optimized a similar sized 192G index on identical hardware with a 400G capacity. 2x for the IndexWriter only. Having an open index reader can increase that somewhat... 3x is the absolute worst case I think and that can currently be avoided by first calling commit and then calling optimize I think. This way the open reader will only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. -Yonik http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: How much disk space does optimize really take
Okay - I think I've got you - your talking about the case of adding a bunch of docs, not calling commit, and then trying to optimize. I keep coming at it from a cold optimize. Making sense to me now. Mark Miller wrote: I can't tell why calling a commit or restarting is going to help anything - or why you need more than 2x in any case. The only reason i can see this being is if you have turned on auto-commit. Otherwise the Reader is *always* only referencing what would have to be around anyway. Your likely to just too close to the edge. There are fragmentation issues and whatnot when your dealing with such large files and so little space above what you need. Phillip Farber wrote: Wow, this is weird. I commit before I optimize. In fact, I bounce tomcat before I optimize just in case. It makse sense, as you say, that then the open reader can only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. But we're still exceeding 2x. And after the optimize fails, if we then do a commit or bounce tomcat, a bunch of segments disappear. I am stumped. Yonik Seeley wrote: On Wed, Oct 7, 2009 at 1:50 PM, Phillip Farber pfar...@umich.edu wrote: So this implies that for a normal optimize, in every case, due to the Searcher holding open the existing segment prior to optimize that we'd always need 3x even in the normal case. This seems wrong since it is repeated stated that in the normal case only 2x is needed and I have successfully optimized a similar sized 192G index on identical hardware with a 400G capacity. 2x for the IndexWriter only. Having an open index reader can increase that somewhat... 3x is the absolute worst case I think and that can currently be avoided by first calling commit and then calling optimize I think. This way the open reader will only be holding references to segments that wouldn't be deleted until the optimize is complete anyway. -Yonik http://www.lucidimagination.com -- - Mark http://www.lucidimagination.com
Re: How much disk space does optimize really take
On Wed, Oct 7, 2009 at 3:31 PM, Mark Miller markrmil...@gmail.com wrote: I can't tell why calling a commit or restarting is going to help anything Depends on what scenarios you consider, and what you are taking 2x of. 1) Open reader on index 2) Open writer and add two documents... the first causes a large merge, and the second is just to make it a non-optimized index. At this point youre already at 2x of your original index size. 3) call optimize()... this will make a 3rd copy before deleting the 2nd. -Yonik http://www.lucidimagination.com
Solr Demo at SF New Tech Meetup
Hello all, For those of you in the Bay Area, we will be demoing our Bodukai Boutique product at the SF New Tech Meetup on Wednesday, Oct. 14: http://sfnewtech.com/2009/10/05/1014-sf-new-tech-bodukai-yourversion-meehive-and-more/ Bodukai Boutique is the fastest ecommerce search and navigation solution: http://bodukai.com/boutique/ We will be demoing our Solr integration and all are welcome to come. Thank you, Nasseam Elkarra http://bodukai.com/boutique/ The fastest possible shopping experience
Re: manage rights
There are no security features in Solr 1.4. You cannot do this. It would be really simple to implement a hack where all management must be done via POST, and then allow the configuration to ban POST requests. On 10/7/09, clico cl...@mairie-marseille.fr wrote: Hi everybody As I'm ready to deploy my solr server (after many tests and use cases) I'd like ton configure my server in order that some request cannot be post As an example : My CMS or data app can use - dataimport - and other indexing commands My website can only perform a search on the server could one explain me where this configuration has to be done? Thanks -- View this message in context: http://www.nabble.com/manage-rights-tp25784152p25784152.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: solr reporting tool adapter
The BIRT project can do what you want. It has a nice form creator and you can configure http XML input formats. It includes very complete Eclipse plugins and there is a book about it. On 10/7/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Oct 7, 2009 at 2:51 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: we basically wanna generate PDF reports which contain, tag clouds, bar charts, pie charts etc. Faceting on a field will give you top terms and frequency information which can be used to create tag clouds. What do you want to plot on a bar chart? I don't know of a reporting tool which can hook into Solr for creating such things. -- Regards, Shalin Shekhar Mangar. -- Lance Norskog goks...@gmail.com
Re: How much disk space does optimize really take
Yonik Seeley wrote: On Wed, Oct 7, 2009 at 3:31 PM, Mark Miller markrmil...@gmail.com wrote: I can't tell why calling a commit or restarting is going to help anything Depends on what scenarios you consider, and what you are taking 2x of. 1) Open reader on index 2) Open writer and add two documents... the first causes a large merge, and the second is just to make it a non-optimized index. At this point youre already at 2x of your original index size. 3) call optimize()... this will make a 3rd copy before deleting the 2nd. -Yonik http://www.lucidimagination.com Yup - finally hit me what you were talking about. Wasn't considering the case of adding docs to an existing index, not committing, and then trying to optimize. I like trying to take an opposing side from you anyway - it means I know where I will end up - but your usually so darn terse, I never know how long till I end up there. Anyway, so all you generally *need* is 2x, you just have to make sure your not adding docs first without committing them - which I was taking for granted. But means your comment of calling commit makes perfect sense. I guess you can't guarantee 2x though, as if you have queries coming in that take a while, a commit opening a new Reader will not guarantee the old Reader is quite ready to go away. Might want to wait a short bit after the commit. -- - Mark http://www.lucidimagination.com
Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
2 things I noticed that are different from 1.3 to 1.4 for DataImport: 1. there are now 2 datetime values (per my specific schema I'm sure) in the dataimport.properties vs. only 1 in 1.3 (using the exact same schema). One is 'last_index_time' same as 1.3, and a *new* one (in 1.4) named item.last_index_time, where 'item' is my main and only entity name specified in my data-import.xml. they both have the same value. 2. in 1.3, the datetime passed to SQL used to be, e.g., '2009-10-05 14:08:01', but with 1.4 the format becomes 'Mon Oct 05 14:08:01 PDT 2009', with the day of week, name of month, and timezone spelled out. I had issue with the 1.4 format with MySQL only for the timezone part, but now I have a different solution without using this last index date altogether. I'm curious though if there's any config setting to pass to DataImportHandler to specify the desired date/time format to use. Michael Noble Paul നോബിള് नोब्ळ्-2 wrote: really? I don't remember that being changed. what difference do u notice? On Wed, Oct 7, 2009 at 2:30 AM, michael8 mich...@saracatech.com wrote: Just looking for confirmation from others, but it appears that the formatting of last_index_time from dataimport.properties (using DataImportHandler) is different in 1.4 vs. that in 1.3. I was troubleshooting why delta imports are no longer working for me after moving over to solr 1.4 (10/2 nighly) and noticed that format is different. Michael -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25776496.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25793468.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Help with denormalizing issues
Hi again, I'm gonna try this again with more focus this time :D 1) Ideally what we would like to do, is plug in an additional mechanism to filter the initial result set, because we can't find a way to implement our filtering needs as filter queries against a single index. We would want to do this while maintaining support for paging. Looking through the codebase it looks as if this would not be possible without major surgery, due to the paging support being implemented deep inside private methods of SolrIndexSearcher. Does this sound accurate? 2) If we pursue the other option of indexing skus and collapsing the results based on product id using the field collapsing patch, is there any validity to my concerns about indexing the same content multiple times skewing the scoring? 3) Does anyone have experience using the field collapsing patch, and have any idea how much additional overhead it incurs? Thanks, Eric -Original Message- From: Eric Reeves Sent: Monday, October 05, 2009 6:19 PM To: solr-user@lucene.apache.org Subject: Help with denormalizing issues Hi there, I'm evaluating Solr as a replacement for our current search server, and am trying to determine what the best strategy would be to implement our business needs. Our problem is that we have a catalog schema with products and skus, one to many. The most relevant content being indexed is at the product level, in the name and description fields. However we are interested in filtering by sku attributes, and in particular making multiple filters apply to a single sku. For example, find a product that contains a sku that is both blue and on sale. No approach I've tried at collapsing the sku data into the product document works for this. If we put the data in separate fields, there's no way to apply multiple filters to the same sku. and if we concatenate all of the relevant sku data into a single multivalued field then as I understand it, this is just indexed as one large field with extra whitespace between the individual entries, so there's still no way to enforce that an AND filter query applies to the same sku. One approach I was considering was to create separate indexes for products and skus, and store the product IDs in the sku documents. Then we could apply our own filters to the initially generated list, based on unique query parameters. I thought creating a component between query and facet would be a good place to add such a filter, but further research seems to indicate that this would break paging and sorting. The only other thing I can think of would be to subclass QueryComponent itself, which looks rather daunting-the process() method has no hooks for this sort of thing, it seems I would have to copy the entire existing implementation and add them myself, which looks to be a fair chunk of work and brittle to changes in the trunk code. Ideally it would be nice to be able to handle certain fq parameters in a completely different way, perhaps using a custom query parser, but I haven't wrapped my head around how those work. Does any of this sound remotely doable? Any advice? The other suggestion we are looking at was given to us by our current search provider, which is to index the skus themselves. It looks as if we may be able to make this work using the field collapsing patch from SOLR-236. I have some concerns about this approach though: 1) It will make for a much larger index and longer indexing times (products can have 10 or more skus in our catalog). 2) Because the indexing will be copying the description and name from the product it will be indexing the same content more than once, and the number of times per product will vary based on the number of skus. I'm concerned that this may skew the scoring algorithm, in particular the inverse frequency part. 3) I'm not sure about the performance of the field collapsing patch, I've read contradictory reports on the web. I apologize if this is a bit rambling. If anyone has any advice for our situation it would be very helpful. Thanks, Eric
Re: How much disk space does optimize really take
On Wed, Oct 7, 2009 at 3:56 PM, Mark Miller markrmil...@gmail.com wrote: I guess you can't guarantee 2x though, as if you have queries coming in that take a while, a commit opening a new Reader will not guarantee the old Reader is quite ready to go away. Might want to wait a short bit after the commit. Right - and in a complete system, there are other things that can also hold commit points open longer, like index replication. -Yonik http://www.lucidimagination.com
Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
On Thu, Oct 8, 2009 at 1:38 AM, michael8 mich...@saracatech.com wrote: 2 things I noticed that are different from 1.3 to 1.4 for DataImport: 1. there are now 2 datetime values (per my specific schema I'm sure) in the dataimport.properties vs. only 1 in 1.3 (using the exact same schema). One is 'last_index_time' same as 1.3, and a *new* one (in 1.4) named item.last_index_time, where 'item' is my main and only entity name specified in my data-import.xml. they both have the same value. This was added with SOLR-783 to enable delta imports of entities individually. One can specify the entity name(s) which should be imported. Without this it was not possible to correctly figure out deltas on a per-entity basis. 2. in 1.3, the datetime passed to SQL used to be, e.g., '2009-10-05 14:08:01', but with 1.4 the format becomes 'Mon Oct 05 14:08:01 PDT 2009', with the day of week, name of month, and timezone spelled out. I had issue with the 1.4 format with MySQL only for the timezone part, but now I have a different solution without using this last index date altogether. I just committed SOLR-1496 so the different date format issue is fixed in trunk. I'm curious though if there's any config setting to pass to DataImportHandler to specify the desired date/time format to use. There is no configuration to change this. However, you can write your own Evaluator to output ${dih.last_index_time} in whatever format you prefer. -- Regards, Shalin Shekhar Mangar.
Re: manage rights
You should also separate your indexer from your searcher and make the searcher request handlers allow search only (remove the handlers you don't need). You could also lock down the request parameters that they take, too, by using invariants, etc. Have a look in your solrconfig.xml. You could, of course, also have a ServletFilter in front of Solr or some other type of firewall that just throws away the requests you don't wish to support. And, of course, firewalls can be used, too. On Oct 7, 2009, at 4:50 PM, Lance Norskog wrote: There are no security features in Solr 1.4. You cannot do this. It would be really simple to implement a hack where all management must be done via POST, and then allow the configuration to ban POST requests. On 10/7/09, clico cl...@mairie-marseille.fr wrote: Hi everybody As I'm ready to deploy my solr server (after many tests and use cases) I'd like ton configure my server in order that some request cannot be post As an example : My CMS or data app can use - dataimport - and other indexing commands My website can only perform a search on the server could one explain me where this configuration has to be done? Thanks -- View this message in context: http://www.nabble.com/manage-rights-tp25784152p25784152.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Help with denormalizing issues
The separate sku do not become one long text string. They are separate values in the same field. The relevance calculation is completely separate per value. The performance problem with the field collapsing patch is that it does the same thing as a facet or sorting operation: it does a sweep through the index and builds a data structure whose size depends on the index. Faceting is not cached directly but still works very quickly the second time. Sorting has its own cache and is very slow (N log N) the first time and very fast afterwards. The field collapsing patch does not cache any of its work and is almost as slow the second time as the first time. On 10/7/09, Eric Reeves eree...@eline.com wrote: Hi again, I'm gonna try this again with more focus this time :D 1) Ideally what we would like to do, is plug in an additional mechanism to filter the initial result set, because we can't find a way to implement our filtering needs as filter queries against a single index. We would want to do this while maintaining support for paging. Looking through the codebase it looks as if this would not be possible without major surgery, due to the paging support being implemented deep inside private methods of SolrIndexSearcher. Does this sound accurate? 2) If we pursue the other option of indexing skus and collapsing the results based on product id using the field collapsing patch, is there any validity to my concerns about indexing the same content multiple times skewing the scoring? 3) Does anyone have experience using the field collapsing patch, and have any idea how much additional overhead it incurs? Thanks, Eric -Original Message- From: Eric Reeves Sent: Monday, October 05, 2009 6:19 PM To: solr-user@lucene.apache.org Subject: Help with denormalizing issues Hi there, I'm evaluating Solr as a replacement for our current search server, and am trying to determine what the best strategy would be to implement our business needs. Our problem is that we have a catalog schema with products and skus, one to many. The most relevant content being indexed is at the product level, in the name and description fields. However we are interested in filtering by sku attributes, and in particular making multiple filters apply to a single sku. For example, find a product that contains a sku that is both blue and on sale. No approach I've tried at collapsing the sku data into the product document works for this. If we put the data in separate fields, there's no way to apply multiple filters to the same sku. and if we concatenate all of the relevant sku data into a single multivalued field then as I understand it, this is just indexed as one large field with extra whitespace between the individual entries, so there's still no way to enforce that an AND filter query applies to the same sku. One approach I was considering was to create separate indexes for products and skus, and store the product IDs in the sku documents. Then we could apply our own filters to the initially generated list, based on unique query parameters. I thought creating a component between query and facet would be a good place to add such a filter, but further research seems to indicate that this would break paging and sorting. The only other thing I can think of would be to subclass QueryComponent itself, which looks rather daunting-the process() method has no hooks for this sort of thing, it seems I would have to copy the entire existing implementation and add them myself, which looks to be a fair chunk of work and brittle to changes in the trunk code. Ideally it would be nice to be able to handle certain fq parameters in a completely different way, perhaps using a custom query parser, but I haven't wrapped my head around how those work. Does any of this sound remotely doable? Any advice? The other suggestion we are looking at was given to us by our current search provider, which is to index the skus themselves. It looks as if we may be able to make this work using the field collapsing patch from SOLR-236. I have some concerns about this approach though: 1) It will make for a much larger index and longer indexing times (products can have 10 or more skus in our catalog). 2) Because the indexing will be copying the description and name from the product it will be indexing the same content more than once, and the number of times per product will vary based on the number of skus. I'm concerned that this may skew the scoring algorithm, in particular the inverse frequency part. 3) I'm not sure about the performance of the field collapsing patch, I've read contradictory reports on the web. I apologize if this is a bit rambling. If anyone has any advice for our situation it would be very helpful. Thanks, Eric -- Lance Norskog goks...@gmail.com
Problems with WordDelimiterFilterFactory
We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: Problems with WordDelimiterFilterFactory
Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: Indexing and searching of sharded/ partitioned databases and tables
Thanks guys. Now I can easily search thru 10TB of my personal photos, videos, music and other stuff :) At some point I had split them into multiple db and tables and inserts to a single db/ table were taking too much time once the index grew beyond 1gig. I was storing all the possible metadata about the media. I used two hex characters for naming tables/dbs and ended up with 256 db, each with 256 tables :D . Don't ask me why I had done it this way. Let's just say I was exploring sharding some years ago and got too excited and did that :D. Alas, never touched it again to finish the search portion till now when I really wanted to find a particular photo :) The pk is unique across all the tables so no issues there. I think I should be able to run it off a single server at my home. Thanks and Best Regards, Jayant On Wed, Oct 7, 2009 at 4:52 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Oct 7, 2009 at 5:09 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote: You can write an automated program which will change the DB conf details in that xml and fire the full import command. You can use http://localhost:8983/solr/dataimport url to check the status of the data import. Also note that full-import deletes all existing documents. So if you write such a program which changes DB conf details, make sure you invoke the import command (new in Solr 1.4) to avoid deleting the other documents. -- Regards, Shalin Shekhar Mangar. -- www.jkg.in | http://www.jkg.in/contact-me/ Jayant Kr. Gandhi
RE: Problems with WordDelimiterFilterFactory
Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Snapshot is not created when I added spellchecker with buildOnCommit
i've enabled the snapshooter to run after commit and it's working fine until i've added a spellchecker with buildOnCommit = true... Any idea why? Thanks updateHandler class=solr.DirectUpdateHandler2 listener event=postCommit class=solr.RunExecutableListener str name=exesolr/bin/snapshooter/str str name=dir./str bool name=waittrue/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener listener event=postOptimize class=solr.RunExecutableListener str name=exesnapshooter/str str name=dirsolr/bin/str bool name=waittrue/bool /listener /updateHandler -- View this message in context: http://www.nabble.com/Snapshot-is-not-created-when-I-added-spellchecker-with-buildOnCommit-tp25796857p25796857.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ISOLatin1AccentFilter before or after Snowball?
Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ in 1.4? -Jay http://www.lucidimagination.com On Wed, Oct 7, 2009 at 1:44 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Oct 6, 2009 at 4:33 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi all, from reading through previous posts on that subject, it seems like the accent filter has to come before the snowball filter. I'd just like to make sure this is so. If it is the case, I'm wondering whether snowball filters for i.e. French process accented language correctly, at all, or whether they remove accents anyway... Or whether accents should be removed whenever making use of snowball filters. I'd think so but I'm not sure. Perhaps someone else can weigh in. And also: it really is meant to take UTF-8 as input, even though it is named ISOLatin1AccentFilter, isn't it? See http://markmail.org/message/hi25u5iqusfu542b -- Regards, Shalin Shekhar Mangar.
Re: ISOLatin1AccentFilter before or after Snowball?
No, ISOLatin1AccentFilterFactory is not deprecated. You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt or ISOLatin1AccentFilterFactory whichever you'd like. Koji Jay Hill wrote: Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ in 1.4? -Jay http://www.lucidimagination.com
Re: Problems with WordDelimiterFilterFactory
Bern, I am interested on the solr query. In other words, the query that your system sends to solr. Thanks, Christian On Oct 7, 2009, at 5:56 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601 Either scroll down and click one of the television broadcasting -- asia links, or type it in the Quick Search box. TIA bern -Original Message- From: Christian Zambrano [mailto:czamb...@gmail.com] Sent: Thursday, 8 October 2009 9:43 AM To: solr-user@lucene.apache.org Subject: Re: Problems with WordDelimiterFilterFactory Could you please provide the exact URL of a query where you are experiencing this problem? eg(Not URL encoded): q=fieldName:hot and cold: temperatures On 10/07/2009 05:32 PM, Bernadette Houghton wrote: We are having some issues with our solr parent application not retrieving records as expected. For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant record (which contains a colon in the same place) does not get retrieved; if the input query does not include the colon, all is fine. Ditto if the user searches for a query containing hyphens, e.g. asia - civilization, although with the qualifier that something like asia-civilization (no spaces either side of the hyphen) works fine, whereas asia - civilization (spaces either side of hyphen) doesn't work. Our schema.xml contains the following - fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: Problems with WordDelimiterFilterFactory
Use http://solr-url/solr/admin/analysis.jsp to see how your data is indexed/queried -- View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
Work like a charm !! thanks Shalin Regards, Mint Shalin Shekhar Mangar wrote: On Thu, Oct 8, 2009 at 1:38 AM, michael8 mich...@saracatech.com wrote: 2 things I noticed that are different from 1.3 to 1.4 for DataImport: 1. there are now 2 datetime values (per my specific schema I'm sure) in the dataimport.properties vs. only 1 in 1.3 (using the exact same schema). One is 'last_index_time' same as 1.3, and a *new* one (in 1.4) named item.last_index_time, where 'item' is my main and only entity name specified in my data-import.xml. they both have the same value. This was added with SOLR-783 to enable delta imports of entities individually. One can specify the entity name(s) which should be imported. Without this it was not possible to correctly figure out deltas on a per-entity basis. 2. in 1.3, the datetime passed to SQL used to be, e.g., '2009-10-05 14:08:01', but with 1.4 the format becomes 'Mon Oct 05 14:08:01 PDT 2009', with the day of week, name of month, and timezone spelled out. I had issue with the 1.4 format with MySQL only for the timezone part, but now I have a different solution without using this last index date altogether. I just committed SOLR-1496 so the different date format issue is fixed in trunk. I'm curious though if there's any config setting to pass to DataImportHandler to specify the desired date/time format to use. There is no configuration to change this. However, you can write your own Evaluator to output ${dih.last_index_time} in whatever format you prefer. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25797806.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TermsComponent or auto-suggest with filter
Thanks Jay. What's a good way of extracting the original text from here? On Thu, Oct 8, 2009 at 1:03 AM, Jay Hill jayallenh...@gmail.com wrote: Something like this, building on each character typed: facet=onfacet.field=tc_queryfacet.prefix=befacet.mincount=1 -Jay http://www.lucidimagination.com On Tue, Oct 6, 2009 at 5:43 PM, R. Tan tanrihae...@gmail.com wrote: Nice. In comparison, how do you do it with faceting? Two other approaches are to use either the TermsComponent (new in Solr 1.4) or faceting. On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill jayallenh...@gmail.com wrote: Have a look at a blog I posted on how to use EdgeNGrams to build an auto-suggest tool: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ You could easily add filter queries to this approach. Ffor example, the query used in the blog could add filter queries like this: http://localhost:8983/solr/select/?q=user_query: ”i”wt=jsonfl=user_queryindent=onechoParams=nonerows=10sort=count descfq=yourField:yourQueryfq=anotherField:anotherQuery -Jay http://www.lucidimagination.com On Tue, Oct 6, 2009 at 4:40 AM, R. Tan tanrihae...@gmail.com wrote: Hello, What's the best way to get auto-suggested terms/keywords that is filtered by one or more fields? TermsComponent should have been the solution but filters are not supported. Thanks, Rihaed
Scoring for specific field queries
Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. For example, I have multiple documents with titles containing the word champion. Some of the document titles start with the word champion and some our entitled we are the champions. The ones that starts with the keyword needs to rank first or score higher. Is there a way to do this? I'm using this query for auto-suggest term feature where the keyword doesn't necessarily need to be the first word. Rihaed
Re: How to determine the size of the index?
Are you referring to schema info ??? You can find it at http://192.168.5.25/solr/admin/file/?file=schema.xml and http://192.168.5.25/solr/admin/schema.jsp Fishman, Vladimir wrote: Is this info available via admin page? -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798508.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring for specific field queries
You would need to boost your startswith matches artificially for the desired behavior. I would do it this way - 1. Create a KeywordTokenized field with n-gram filter. 2. Create a Whitespace tokenized field with n-gram flter. 3. Search on both the fields, boost matches for #1 over #2. Hope this helps. Cheers Avlesh On Thu, Oct 8, 2009 at 10:30 AM, R. Tan tanrihae...@gmail.com wrote: Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. For example, I have multiple documents with titles containing the word champion. Some of the document titles start with the word champion and some our entitled we are the champions. The ones that starts with the keyword needs to rank first or score higher. Is there a way to do this? I'm using this query for auto-suggest term feature where the keyword doesn't necessarily need to be the first word. Rihaed
Re: How to retrieve the index of a string within a field?
Elaine, The field type text contains tokenizer class=solr.WhitespaceTokenizerFactory/ in its definition. So all the sentences that are indexed / queried will be split in to words. So when you search for 'get what you', you will get sentences containing get, what, you, get what, get you, what you, get what you. So when you try to find the indexOf of the keyword in that sentence (from results), you may not get it all the times. Solrj can give the results in one shot but it uses http call. You cant avoid it. You don't need to query multiple times with Solrj. Query once, get the results, store them in java beans, process it and display the results. Regards, Sandeep Elaine Li wrote: Sandeep, I do get results when I search for get what you, not 0 results. What in my schema makes this difference? I need to learn Solrj. I am currently using javascript as a client and invoke http calls to get results to display in the browser. Can Solrj get all the results at one short w/o the http call? I need to do some postprocessing against all the results and then display the processed data. Submitting multiple http queries and post-process after each query does not seem to be the right way. -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring for specific field queries
Hi Rihaed, I guess we don't need to depend on scores all the times. You can use custom sort to sort the results. Take a dynamicField, fill it with indexOf(keyword) value, sort the results by the field in ascending order. Then the records which contain the keyword at the earlier position will come first. Regards, Sandeep R. Tan wrote: Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring for specific field queries
I guess we don't need to depend on scores all the times. You can use custom sort to sort the results. Take a dynamicField, fill it with indexOf(keyword) value, sort the results by the field in ascending order. Then the records which contain the keyword at the earlier position will come first. Warning: This is a bad idea for multiple reasons: 1. If the word computer occurs in multiple times in a document what would you do in that case? Is this dynamic field supposed to be multivalued? I can't even imagine what would you do if the word computer occurs in multiple documents multiple times? 2. Multivalued fields cannot be sorted upon. 3. One needs to know the unique number of such keywords before implementing because you'll potentially end up creating those many fields. Cheers Avlesh On Thu, Oct 8, 2009 at 11:10 AM, Sandeep Tagore sandeep.tag...@gmail.comwrote: Hi Rihaed, I guess we don't need to depend on scores all the times. You can use custom sort to sort the results. Take a dynamicField, fill it with indexOf(keyword) value, sort the results by the field in ascending order. Then the records which contain the keyword at the earlier position will come first. Regards, Sandeep R. Tan wrote: Hi, How can I get wildcard search (e.g. cha*) to score documents based on the position of the keyword in a field? Closer (to the start) means higher score. -- View this message in context: http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: delay while adding document to solr index
thanks for your reply but sorry for the delay as you said i have removed the commit while adding single document and set the auto commit for maxDocs200/maxDocs maxTime1/maxTime after setting when i run optimize() manually the size decreased to 350MB(10 docs) from 638MB(10 docs) i think this happened because i run the optimize for the first time on index data that is configured 4 months back.. this worked great but after one week again the index size reached 504MB (10 docs) i don't understand why my solr index increasing daily when i am adding and deleting the same number of documents daily i run org.apache.solr.client.solrj.SolrServer.optimize() manually four times a day is it not the right way to run optimize, if yes what is the procedure to run optimize? thanks in advance :) -- View this message in context: http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25798789.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problems with WordDelimiterFilterFactory
Hi Bern, I indexed some records with - and : today using your configuration and I searched with following urls http://localhost/solr/select?q=CONTENT:cold : temperature http://localhost/solr/select?q=CONTENT:cold: temperature http://localhost/solr/select?q=CONTENT:cold :temperature http://localhost/solr/select?q=CONTENT:cold temperature and http://localhost/solr/select?q=CONTENT:asia - civilization http://localhost/solr/select?q=CONTENT:asia- civilization http://localhost/solr/select?q=CONTENT:asia -civilization http://localhost/solr/select?q=CONTENT:asia civilization The results doesn't make any difference. It worked all the times and I saw the relevant records. Regards, Sandeep -- View this message in context: http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25798793.html Sent from the Solr - User mailing list archive at Nabble.com.