Re: Data Import from a Queue
Brandon, i don't know how they are using it in detail, but Part of Chef's Architecture is this one: Chef Server - RabbitMQ - Chef Solr Indexer - Solr http://wiki.opscode.com/download/attachments/7274878/chef-server-arch.png Perhaps not exactly, what you're looking for - but may give you an idea? Regards Stefan Am 19.07.2011 19:04, schrieb Brandon Fish: Let me provide some more details to the question: I was unable to find any example implementations where individual documents (single document per message) are read from a message queue (like ActiveMQ or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another method. Does anyone know of any available examples for this type of import? If no examples exist, what would be a recommended commit strategy for performance? My best guess for this would be to have a queue per core and commit once the queue is empty. Thanks. On Mon, Jul 18, 2011 at 6:52 PM, Erick Ericksonerickerick...@gmail.comwrote: This is a really cryptic problem statement. you might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fishbrandon.j.f...@gmail.com wrote: Does anyone know of any existing examples of importing data from a queue into Solr? Thank you.
Re: how to get solr core information using solrj
Jiang, what about http://wiki.apache.org/solr/CoreAdmin#STATUS ? Regards Stefan Am 20.07.2011 05:40, schrieb Jiang mingyuan: hi all, Our solr server contains two cores:core0,core1,and they both works well. Now I'am trying to find a way to get information about core0 and core1. Can solrj or other api do this? thanks very much.
suggester component from trunk throwing error
hi I am trying to configure suggester component. I downloaded solr from trunk and did a build. here is my config requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr name=components strsuggest/str /arr /requestHandler searchComponent name=suggest class=solr.SpellCheckComponent lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldname_autocomplete/str str name=buildOnCommittrue/str /lst /searchComponent When i build my index, index gets created but i get following exception - Jul 20, 2011 2:32:00 AM org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener buildSpellIndex INFO: Building spell index for spell checker: suggest Jul 20, 2011 2:32:00 AM org.apache.solr.spelling.suggest.Suggester build INFO: build() Jul 20, 2011 2:32:00 AM org.apache.solr.common.SolrException log SEVERE: java.lang.NoSuchMethodError: org.apache.lucene.index.IndexReader.fields()Lorg/apache/lucene/index/Fields; at org.apache.lucene.index.MultiFields.getFields(MultiFields.java:64) at org.apache.lucene.index.MultiFields.getFields(MultiFields.java:69) at org.apache.lucene.index.MultiFields.getTerms(MultiFields.java:142) at org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.init(HighFrequencyDictionary.java:65) at org.apache.lucene.search.spell.HighFrequencyDictionary.getWordsIterator(HighFrequencyDictionary.java:54) at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:63) at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:136) at org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.buildSpellIndex(SpellCheckComponent.java:373) at org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:358) at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1163) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Any help? -- View this message in context: http://lucene.472066.n3.nabble.com/suggester-component-from-trunk-throwing-error-tp3184736p3184736.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: - character in search query
Here is my complete fieldtype: fieldType name=name class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.PatternTokenizerFactory pattern=\s|, / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=- replacement=/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType In the Field Analysis i see that the - is removed by the patternreplaceFilter. When i escaped the term($q = SolrUtils::escapeQueryChars($q);) i see in my debugQuery something like this(term = arsenal - london): +((DisjunctionMaxQuery((name:arsenal)~1.0) DisjunctionMaxQuery((name:\ london~1.0))~2) () When i don't escaped the query i get something like this: +((DisjunctionMaxQuery((name:arsenal)~1.0) -DisjunctionMaxQuery((name:london)~1.0))~1) () The - is my term is used by the -DisjunctionMaxQuery. How can i fix this problem? What is the Easiest way? -- View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184805.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query time boosting in solr
Can anyone throw some light on this issue? My problem is to: give a query time boost to certain documents, which have a field, say field1, in the range that the user chooses during query time. I think the below link indicates a range query: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 But, apart from that, how can I indicate a boost for the condition field1:[10%20TO%2030]? I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO 25]^10 -But I am not able to figure out what these two mean, from the results. Because, i get top1 result as a document where field1 is 40..in this case..after using bq clause. I increased the boost to 10,20,50 100..but the results dont change at all. S. On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Is query time boosting possible in Solr? Here is what I want to do: I want to boost the ranking of certain documents, which have their relevant field values, in a particular range (selected by user at query time)... when I do something like: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10 -I guess, it is just a filter over the normal results and not exactly a query. I tried giving this: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 -This still worked and gave me different results. But, I did not quite understand what this second query meant. Does it mean: Rank those documents with field1 value in 10-30 better than those without ? S -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Solr UI
On Tue, Jul 19, 2011 at 7:51 PM, Erik Hatcher erik.hatc...@gmail.com wrote: There's several starting points for Solr UI out there, but really the best choice is whatever fits your environment and the skills/resources you have handy. Here's a few off the top of my head - [...] Besides these excellent examples, if you are looking at Python/Django, Haystack works well as a starting point, though: * One does have to build a template/view architecture around it, that is fairly easy to do. * Haystack allows multiple search back-ends, and while that is convenient for starting out, it does not implement some Solr features. E.g., one big missing item is support for multi-core Solr. Regards, Gora
Re: any detailed tutorials on plugin development?
On Wed, Jul 20, 2011 at 6:29 AM, deniz denizdurmu...@gmail.com wrote: gosh sorry for my typo in msg first... i just realized it now... well anyway... i would like to find a detailed tutorial about how to implement an analyzer or a request handler plugin... but all i have got is nothing from the documentation of solr wiki... This does not help: http://wiki.apache.org/solr/SolrPlugins ? Google also turns up multiple examples, e.g., http://e-mats.org/2008/06/writing-a-solr-analysis-filter-plugin/ I remember using that blog as a starting point for writing a custom plugin. Regards, Gora
Re: - character in search query
When i use the edismax handler the escaping works great(before i used the dismax handler).The debugQuery shows me this: +((DisjunctionMaxQuery((name:arsenal)~1.0) DisjunctionMaxQuery((name:london)~1.0))~2 The \ is not in the parsedquery, so i get the results i wanted. I don't know why the dismax handler working this way. Can someone tells me the difference between the dismax and edismax handler? -- View this message in context: http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184941.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: any detailed tutorials on plugin development?
actually i'm rewriting http://wiki.apache.org/solr/UpdateRequestProcessor this wiki page with a more detailed how-to, it will be ready and online after i get back from work! -- View this message in context: http://lucene.472066.n3.nabble.com/any-detailed-tutorials-on-plugin-development-tp3177821p3184990.html Sent from the Solr - User mailing list archive at Nabble.com.
term positions performance
Hi, I am developing a new query term proximity and i am using the term positions to get the positions of each term. I want to know if there is any clues to increase the perfomance of using term positions, in index time o in query time, all my fields that i am applying the term positions are indexed. Thanks in advance, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Re: term positions performance
Also, i develop this query via function query, i wonder if i do it via a normal query will increase the perfomance.. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/7/20 Marco Martinez mmarti...@paradigmatecnologico.com Hi, I am developing a new query term proximity and i am using the term positions to get the positions of each term. I want to know if there is any clues to increase the perfomance of using term positions, in index time o in query time, all my fields that i am applying the term positions are indexed. Thanks in advance, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Re: POST VS GET and NON English Characters
Paul , I added the fllowing line to catalina.sh and restarted the server ,but this does not seem to help. JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8 Regards Sujatha On Sun, Jul 17, 2011 at 3:51 AM, Paul Libbrecht p...@hoplahup.net wrote: If you have the option, try setting the default charset of the servlet-container to utf-8. Typically this is done by setting a system property on startup. My experience has been that the default used to be utf-8 but it is less and less and sometimes in a surprising way! paul Le 16 juil. 2011 à 05:34, Sujatha Arun a écrit : It works fine with GET method ,but I am wondering why it does not with POST method. 2011/7/15 pankaj bhatt panbh...@gmail.com Hi Arun, This looks like an Encoding issue to me. Can you change your browser settinsg to UTF-8 and hit the search url via GET method. We faced the similar problem with chienese,korean languages, this solved the problem. / Pankaj Bhatt. 2011/7/15 Sujatha Arun suja.a...@gmail.com Hello, We have implemented solr search in several languages .Intially we used the GET method for querying ,but later moved to POST method to accomodate lengthy queries . When we moved form GET TO POSt method ,the german characteres could no longer be searched and I had to use the fucntion utf8_decode in my application for the search to work for german characters. Currently I am doing this while quering using the POST method ,we are using the standard Request Handler $this-_queryterm=iconv(UTF-8, ISO-8859-1//TRANSLIT//IGNORE, $this-_queryterm); This makes the query work for german characters and other languages but does not work for certain charactes in Lithuvanian and spanish.Example: *Not working - *Iš - Estremadūros - sNaująjį - MEDŽIAGOTYRA - MEDŽIAGOS - taškuose *Working - *garbę - ieškoti - ispanų Any ideas /input ? Regards Sujatha
Re: embeded solrj doesn't refresh index
You should send a commit to you embedded solr Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/7/20 Jianbin Dai j...@huawei.com Hi, I am using embedded solrj. After I add new doc to the index, I can see the changes through solr web, but not from embedded solrj. But after I restart the embedded solrj, I do see the changes. It works as if there was a cache. Anyone knows the problem? Thanks. Jianbin
Re: query time boosting in solr
Hi Tomasso Thanks for a quick response. So, if I say: http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2* defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30 -will it be right? The above query: boosts the documents which suit the given query (scientific), which has Field1 values between 20-25, by a factor of 10 : Is that right?? S 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Sowmya, bq is a great way of boosting, but you have to be using the Dismax Query Parser or the Extended Dismax (edismax) query parser, it doesn't work with the Lucene Query Parser. If you can use any of those, then that's the solution. If you need to use the Lucene Query Parser, for a user query like: scientific temper you could create a query like: (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X being X the boost you want for those documents. with your query: scientific temper field1:[10 TO 2030] you are either adding the condition of the range value for the field (if your default operator is AND) or adding another way of matching the query (if your default operator ir OR, you can have documents in your result set that only matched the range query, and this is not what the user wanted). Hope this helps, Tomás On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com wrote: Can anyone throw some light on this issue? My problem is to: give a query time boost to certain documents, which have a field, say field1, in the range that the user chooses during query time. I think the below link indicates a range query: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 But, apart from that, how can I indicate a boost for the condition field1:[10%20TO%2030]? I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO 25]^10 -But I am not able to figure out what these two mean, from the results. Because, i get top1 result as a document where field1 is 40..in this case..after using bq clause. I increased the boost to 10,20,50 100..but the results dont change at all. S. On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Is query time boosting possible in Solr? Here is what I want to do: I want to boost the ranking of certain documents, which have their relevant field values, in a particular range (selected by user at query time)... when I do something like: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10 -I guess, it is just a filter over the normal results and not exactly a query. I tried giving this: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 -This still worked and gave me different results. But, I did not quite understand what this second query meant. Does it mean: Rank those documents with field1 value in 10-30 better than those without ? S -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: query time boosting in solr
Yes, it should, but make sure you specify at least the qf parameter for dismax. You can activate debugQuery and you'll see which documents get boosted and which aren't. On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Tomasso Thanks for a quick response. So, if I say: http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2* defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30 -will it be right? The above query: boosts the documents which suit the given query (scientific), which has Field1 values between 20-25, by a factor of 10 : Is that right?? S 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Sowmya, bq is a great way of boosting, but you have to be using the Dismax Query Parser or the Extended Dismax (edismax) query parser, it doesn't work with the Lucene Query Parser. If you can use any of those, then that's the solution. If you need to use the Lucene Query Parser, for a user query like: scientific temper you could create a query like: (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X being X the boost you want for those documents. with your query: scientific temper field1:[10 TO 2030] you are either adding the condition of the range value for the field (if your default operator is AND) or adding another way of matching the query (if your default operator ir OR, you can have documents in your result set that only matched the range query, and this is not what the user wanted). Hope this helps, Tomás On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com wrote: Can anyone throw some light on this issue? My problem is to: give a query time boost to certain documents, which have a field, say field1, in the range that the user chooses during query time. I think the below link indicates a range query: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 But, apart from that, how can I indicate a boost for the condition field1:[10%20TO%2030]? I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO 25]^10 -But I am not able to figure out what these two mean, from the results. Because, i get top1 result as a document where field1 is 40..in this case..after using bq clause. I increased the boost to 10,20,50 100..but the results dont change at all. S. On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Is query time boosting possible in Solr? Here is what I want to do: I want to boost the ranking of certain documents, which have their relevant field values, in a particular range (selected by user at query time)... when I do something like: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10 -I guess, it is just a filter over the normal results and not exactly a query. I tried giving this: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 -This still worked and gave me different results. But, I did not quite understand what this second query meant. Does it mean: Rank those documents with field1 value in 10-30 better than those without ? S -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Solr 3.3: Exception in thread Lucene Merge Thread #1
Update. After adding 1626 documents without doing a commit or optimize: /Exception in thread Lucene Merge Thread #1 org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Map failed at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779) ... 10 more / Any ideas, any suggestions? Greetz thank you, Sebastian -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3185344.html Sent from the Solr - User mailing list archive at Nabble.com.
Culr Tika not working with blanks into literal.field
Hi. I'm trying to index binary documents with curl and Tika for extracting text. The problem is that when I set the value of a field with spaces blanks using the input parameter: literal.fieldname=value, the document is not indexed. The sentence I send is the follow: curl http://localhost:8983/solr/update/extract?literal.id=doc1\literal.url=/mnt/windows/Ofertas/2006 Portal Intranet/DOCUMENTACION/datos.doc\uprefix=attr_\fmap.content=text\commit=true -F myfile=\@/mnt/windows/Ofertas/DOCUMENTACION/datos.doc That is literal.url=value with blanks apparently is not working
Re: defType argument weirdness
On Tue, Jul 19, 2011 at 11:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Is it generally recognized that this terminology is confusing, or is it just me? I do understand what they do (at least well enough to use them), but I find it confusing that it's called defType as a main param, but type in a LocalParam When used as the main param, it is still just the default (i.e. it may be overridden). For example defType=luceneq={!func}1 (and then there's 'qt', often confused with defType/type by newbies, since they guess it stands for 'query type', but which should probably actually have been called 'requestHandler'/'rh' instead, since that's what it actually chooses, no? It gets very confusing). Yeah, qt is very historical... before the QParserPlugin framework, and before request handlers were used for many other things (including updates). -Yonik http://www.lucidimagination.com If it's generally recognized it's confusing and perhaps a somewhat inconsistent mental model being implied, I wonder if there'd be any interest in renaming these to be more clear, leaving the old ones as aliases/synonyms for backwards compatibility (perhaps with a long deprecation period, or perhaps existing forever). I know it was very confusing to me to keep track of these parameters and what they did for quite a while, and still trips me up from time to time. Jonathan From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley [yo...@lucidimagination.com] Sent: Tuesday, July 19, 2011 9:40 PM To: solr-user@lucene.apache.org Subject: Re: defType argument weirdness On Tue, Jul 19, 2011 at 1:25 PM, Naomi Dushay ndus...@stanford.edu wrote: Regardless, I thought that defType=dismaxq=*:* is supposed to be equivalent to q={!defType=dismax}*:* and also equivalent to q={!dismax}*:* Not quite - there is a very subtle distinction. {!dismax} is short for {!type=dismax}, the type of the actual query, and this may not be overridden. The defType local param is only the default type for sub-queries (as opposed to the current query). It's useful in conjunction with the query or nested query qparser: http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html -Yonik http://www.lucidimagination.com
Re: query time boosting in solr
Hi Tomas Here is what I was trying to give. http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2defType=dismaxq=scientificbq=Field1:[20%20TO%2030] ^10start=0rows=30qf=textfl=Field1,dociddebugQuery=on Over here, I was trying to change the range of Field1, keeping everything else intact. Here are my observations: 1) The number of results found remain intact. Only that the order of the results varies. 2) The boost factor (10) does not seem to throw any influence at all. Here is what the debugQuery says: str name=parsedquery+DisjunctionMaxQuery((text:scientif)) () Field1:[20.0 TO 30.0]^10.0/str str name=parsedquery_toString+(text:scientif) () Field1:[20.0 TO 30.0]^10.0/str From these, it seems like its just filtering the results based on the Field1 values, rather than performing a Boost Query. S. 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Yes, it should, but make sure you specify at least the qf parameter for dismax. You can activate debugQuery and you'll see which documents get boosted and which aren't. On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Tomasso Thanks for a quick response. So, if I say: http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2* defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30 -will it be right? The above query: boosts the documents which suit the given query (scientific), which has Field1 values between 20-25, by a factor of 10 : Is that right?? S 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Sowmya, bq is a great way of boosting, but you have to be using the Dismax Query Parser or the Extended Dismax (edismax) query parser, it doesn't work with the Lucene Query Parser. If you can use any of those, then that's the solution. If you need to use the Lucene Query Parser, for a user query like: scientific temper you could create a query like: (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X being X the boost you want for those documents. with your query: scientific temper field1:[10 TO 2030] you are either adding the condition of the range value for the field (if your default operator is AND) or adding another way of matching the query (if your default operator ir OR, you can have documents in your result set that only matched the range query, and this is not what the user wanted). Hope this helps, Tomás On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com wrote: Can anyone throw some light on this issue? My problem is to: give a query time boost to certain documents, which have a field, say field1, in the range that the user chooses during query time. I think the below link indicates a range query: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 But, apart from that, how can I indicate a boost for the condition field1:[10%20TO%2030]? I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO 25]^10 -But I am not able to figure out what these two mean, from the results. Because, i get top1 result as a document where field1 is 40..in this case..after using bq clause. I increased the boost to 10,20,50 100..but the results dont change at all. S. On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Is query time boosting possible in Solr? Here is what I want to do: I want to boost the ranking of certain documents, which have their relevant field values, in a particular range (selected by user at query time)... when I do something like: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10 -I guess, it is just a filter over the normal results and not exactly a query. I tried giving this: http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10 -This still worked and gave me different results. But, I did not quite understand what this second query meant. Does it mean: Rank those documents with field1 value in 10-30 better than those without ? S -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism
Re: Geospatial queries in Solr
Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
Re: Geospatial queries in Solr
Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a mvn clean install and you'll be back in business. On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote: Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
Reading Solr's JSON
Hi All Which is the best way to read Solr's JSON output, from a Java code? There seems to be a JSONParser in one of the jar files in SolrLib (org.apache.noggit..)...but I dont understand how to read the parsed output in this. Are there any better JSON parsers for Java? S -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Solr suggester and spell checker
hi I am having same issue, did you find the solution for this problem? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-tp2326907p3185680.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reading Solr's JSON
On Wed, Jul 20, 2011 at 10:58 AM, Sowmya V.B. vbsow...@gmail.com wrote: Which is the best way to read Solr's JSON output, from a Java code? You could use SolrJ - it handles parsing for you (and uses the most efficient binary format by default). There seems to be a JSONParser in one of the jar files in SolrLib (org.apache.noggit..)...but I dont understand how to read the parsed output in this. If you just want to deserialize into objects (Maps, Lists, etc) then it's easy: ObjectBuilder.fromJSON(my_json_string) -Yonik http://www.lucidimagination.com
Manipulating a Fuzzy Query's Prefix Length
We're performing fuzzy searches on a field possessing a large number of unique terms. Specifying a required minimum similarity of 0.7 results in a query execution time of 13-15 seconds, which stands in stark contrast to our average query time of 40ms. We suspect that the performance problem most likely emanates from the enumeration over all the unique terms in the index. The Lucene documentation for FuzzyQuery supports this theory with the following warning: *Warning:* this query is not very scalable with its default prefix length of 0 - in this case, *every* term will be enumerated and cause an edit score calculation. We would therefore like to set the prefix length to one or two, mandating that the first couple of characters match and thereby substantially reduce the number of terms enumerated. Is this possible with Solr? I haven't yet discovered a method, if so. Any help would be greatly appreciated.
Tokenizer Question
I have a query which starts out with something like name:john, I need to expand this to something like name:(john johnny). I've implemented a custom tokenzier which gets close, but isn't quite right it outputs name:john johnny. Is there a simple example of doing what I'm attempting?
How can i find a document by a special id?
Hi, i'm new to solr. I built an application using the standard solr 3.3 examples as default. My id field is a string and is copied to a solr.TextField (searchtext) for search queries. All works fine except i try to get documents by a special id. Let me explain the detail's. Assume id = 1234567. I would like to query this document by using q=searchtext:AB1234567. The prefix (AB) is acting as a pseudo-id in our system. Users know and search for it. But it's not findable because solr-index only knows the short id. Adding a new document with the prefixed-id as id is not an option. Then i have to add many documents. For my understanding stemming and ngram tokenizing is not possible because they act on tokens longer then the search token. How can i do this? Thanks Per
Re: How can i find a document by a special id?
Perhaps I'm missing something, but if your fields are indexed as 1234567 but users are searching for AB1234567, is it not possible simply to strip the prefix from the user's input before sending the request? On Wed, Jul 20, 2011 at 10:57 AM, Per Newgro per.new...@gmx.ch wrote: Hi, i'm new to solr. I built an application using the standard solr 3.3 examples as default. My id field is a string and is copied to a solr.TextField (searchtext) for search queries. All works fine except i try to get documents by a special id. Let me explain the detail's. Assume id = 1234567. I would like to query this document by using q=searchtext:AB1234567. The prefix (AB) is acting as a pseudo-id in our system. Users know and search for it. But it's not findable because solr-index only knows the short id. Adding a new document with the prefixed-id as id is not an option. Then i have to add many documents. For my understanding stemming and ngram tokenizing is not possible because they act on tokens longer then the search token. How can i do this? Thanks Per
Re: Tokenizer Question
I'm not sure how to accomplish what you're asking, but have you considered using a synonyms file? This would also allow you to catch ostensibly unrelated name substitutes such as Robert - Bob and Richard - Dick. On Wed, Jul 20, 2011 at 10:57 AM, Jamie Johnson jej2...@gmail.com wrote: I have a query which starts out with something like name:john, I need to expand this to something like name:(john johnny). I've implemented a custom tokenzier which gets close, but isn't quite right it outputs name:john johnny. Is there a simple example of doing what I'm attempting?
Re: Tokenizer Question
My use case really isn't names, I just used that as a simplification. I did look at the Synonym filter to see if I could implement a similar filter (if that was a more appropriate place to do so) but even after doing that I ended up with the same result. On Wed, Jul 20, 2011 at 12:07 PM, Kyle Lee randall.kyle@gmail.com wrote: I'm not sure how to accomplish what you're asking, but have you considered using a synonyms file? This would also allow you to catch ostensibly unrelated name substitutes such as Robert - Bob and Richard - Dick. On Wed, Jul 20, 2011 at 10:57 AM, Jamie Johnson jej2...@gmail.com wrote: I have a query which starts out with something like name:john, I need to expand this to something like name:(john johnny). I've implemented a custom tokenzier which gets close, but isn't quite right it outputs name:john johnny. Is there a simple example of doing what I'm attempting?
Re: Geospatial queries in Solr
Thanks for the update David, I'll give that a try now. On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote: Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a mvn clean install and you'll be back in business. On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote: Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
Re: How can i find a document by a special id?
Am 20.07.2011 18:03, schrieb Kyle Lee: Perhaps I'm missing something, but if your fields are indexed as 1234567 but users are searching for AB1234567, is it not possible simply to strip the prefix from the user's input before sending the request? On Wed, Jul 20, 2011 at 10:57 AM, Per Newgroper.new...@gmx.ch wrote: Hi, i'm new to solr. I built an application using the standard solr 3.3 examples as default. My id field is a string and is copied to a solr.TextField (searchtext) for search queries. All works fine except i try to get documents by a special id. Let me explain the detail's. Assume id = 1234567. I would like to query this document by using q=searchtext:AB1234567. The prefix (AB) is acting as a pseudo-id in our system. Users know and search for it. But it's not findable because solr-index only knows the short id. Adding a new document with the prefixed-id as id is not an option. Then i have to add many documents. For my understanding stemming and ngram tokenizing is not possible because they act on tokens longer then the search token. How can i do this? Thanks Per Sorry for being not clear here. I only use a single search field. It can contain multiple search words. One of them is the id. So i don't realy know that the search word is an id. The usecase is: We have a product database with some items. The product has an id, name, features etc. They all go in the described serachtext field. We promote our products in different medias. So every product can have a mediaid (AB is mediacode 1234567 is the id). And users should be able to find the product by id and mediaid. I hope i could explain myself better. Thanks for helping me Per
Wiki Error JSON syntax
Hi, I was writing a Solr Client API for Node and I found an error on this page http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands the JSON is not valid because there are duplicate keys and two times with add and delete.I tried with an array and it doesn't work as well, I got error 400, I think that's because the syntax is bad. I don't really know if I am at the good place to talk about that but ... that the only place I found. Sorry if it's not. Thanks, And I love Solr :)
Re: Solr 3.3: Exception in thread Lucene Merge Thread #1
Here we go ... This time we tried to use the old LogByteSizeMergePolicy and SerialMergeScheduler: mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy/ mergeScheduler class=org.apache.lucene.index.SerialMergeScheduler/ We did this before, just to be sure ... ~300 Documents: / SEVERE: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2714) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2709) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2705) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3509) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1850) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1814) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1778) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:183) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:416) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:98) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:403) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:301) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:162) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:140) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:736) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779) ... 44 more 20.07.2011 18:07:30 org.apache.solr.core.SolrCore execute INFO: [core.digi20] webapp=/solr path=/update params={} status=500 QTime=12302 20.07.2011 18:07:30 org.apache.solr.common.SolrException log SEVERE: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216) at
Re: Wiki Error JSON syntax
On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou remyloubra...@gmail.com wrote: Hi, I was writing a Solr Client API for Node and I found an error on this page http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands the JSON is not valid because there are duplicate keys and two times with add and delete. It's a common misconception that it's invalid JSON. Duplicate keys are in fact legal. -Yonik http://www.lucidimagination.com I tried with an array and it doesn't work as well, I got error 400, I think that's because the syntax is bad. I don't really know if I am at the good place to talk about that but ... that the only place I found. Sorry if it's not. Thanks, And I love Solr :)
Re: query time boosting in solr
So, what you want is to have the same exact results set as if the query was scientific, but the documents that also match Field1:[20 TO 30] to have more score, right? On Wed, Jul 20, 2011 at 10:53 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Tomas Here is what I was trying to give. http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2defType=dismaxq=scientificbq=Field1:[20%20TO%2030] ^10start=0rows=30qf=textfl=Field1,dociddebugQuery=on This query seems OK for that purpose. Over here, I was trying to change the range of Field1, keeping everything else intact. Here are my observations: 1) The number of results found remain intact. Only that the order of the results varies. Isn't this what was expected? 2) The boost factor (10) does not seem to throw any influence at all. It's on the parsed query, Why do you think it doesn't have an influence? Can you send the debug query output for a document that match the bt? I tried it with the Solr example and this is what I see: http://localhost:8983/solr/select?defType=dismaxq=displaybq=weight:[0%20TO%2010] ^10start=0rows=30debugQuery=onqf=features%20name This is the debug output for a document that match the query and the boost query: str name=MA147LL/A 1.137027 = (MATCH) sum of: 0.1994111 = (MATCH) max of: 0.1994111 = (MATCH) weight(features:display in 0), product of: 0.34767273 = queryWeight(features:display), product of: 3.7080503 = idf(docFreq=1, maxDocs=30) 0.0937616 = queryNorm 0.57355976 = (MATCH) fieldWeight(features:display in 0), product of: 1.4142135 = tf(termFreq(features:display)=2) 3.7080503 = idf(docFreq=1, maxDocs=30) 0.109375 = fieldNorm(field=features, doc=0) 0.937616 = (MATCH) ConstantScore(weight:[0.0 TO 10.0]^10.0)^10.0, product of: 10.0 = boost 0.0937616 = queryNorm /str and this is the debug output for a document that only match the main query: str name=VA902B 0.4834455 = (MATCH) sum of: 0.4834455 = (MATCH) max of: 0.4834455 = (MATCH) weight(name:display in 12), product of: 0.34767273 = queryWeight(name:display), product of: 3.7080503 = idf(docFreq=1, maxDocs=30) 0.0937616 = queryNorm 1.3905189 = (MATCH) fieldWeight(name:display in 12), product of: 1.0 = tf(termFreq(name:display)=1) 3.7080503 = idf(docFreq=1, maxDocs=30) 0.375 = fieldNorm(field=name, doc=12) /str Do you have something similar?? Here is what the debugQuery says: str name=parsedquery+DisjunctionMaxQuery((text:scientif)) () Field1:[20.0 TO 30.0]^10.0/str str name=parsedquery_toString+(text:scientif) () Field1:[20.0 TO 30.0]^10.0/str From these, it seems like its just filtering the results based on the Field1 values, rather than performing a Boost Query. S. 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Yes, it should, but make sure you specify at least the qf parameter for dismax. You can activate debugQuery and you'll see which documents get boosted and which aren't. On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Tomasso Thanks for a quick response. So, if I say: http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2* defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30 -will it be right? The above query: boosts the documents which suit the given query (scientific), which has Field1 values between 20-25, by a factor of 10 : Is that right?? S 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Sowmya, bq is a great way of boosting, but you have to be using the Dismax Query Parser or the Extended Dismax (edismax) query parser, it doesn't work with the Lucene Query Parser. If you can use any of those, then that's the solution. If you need to use the Lucene Query Parser, for a user query like: scientific temper you could create a query like: (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X being X the boost you want for those documents. with your query: scientific temper field1:[10 TO 2030] you are either adding the condition of the range value for the field (if your default operator is AND) or adding another way of matching the query (if your default operator ir OR, you can have documents in your result set that only matched the range query, and this is not what the user wanted). Hope this helps, Tomás On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com wrote: Can anyone throw some light on this issue? My problem is to: give a query time boost to certain documents, which have a field, say field1, in the range that the user chooses during query time. I think the below link indicates a range query:
Re: How can i find a document by a special id?
Is the mediacode always alphabetic, and is the ID always numeric?
Schema design/data import
Greetings. I am struggling to design a schema and a data import/update strategy for some semi-complicated data. I would appreciate any input. What we have is a bunch of database records that may or may not have files attached. Sometimes no files, sometimes 50. The requirement is to index the database records AND the documents, and the search results would be just links to the database records. I'd love to crawl the site with Nutch and be done with it, but we have a complicated search form with various codes and attributes for the database records, so we need a detailed schema that will loosely correspond to boxes on the search form. I don't think we could easily do that if we just crawl the site. But with a detailed schema, I'm having trouble understanding how we could import and index from the database, and also index the related files, and have the same schema being populated, especially with the number of related documents being variable (maybe index them all to one field?). We have a lot of flexibility on how we can build this, so I'm open to any suggestions or pointers for further reading. I've spent a fair amount of time on the wiki but I didn't see anything that seemed directly relevant. An additional difficulty, that I am willing to overlook for the first cut, is that some of these files are zipped, and some of the zip files may contain other zip files, to maybe 3 or 4 levels deep. Help, please? cheers, Travis -- ** *Travis Low, Director of Development* ** t...@4centurion.com* * *Centurion Research Solutions, LLC* *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151* *703-956-6276 *•* 703-378-4474 (fax)* *http://www.centurionresearch.com* http://www.centurionresearch.com **The information contained in this email message is confidential and protected from disclosure. If you are not the intended recipient, any use or dissemination of this communication, including attachments, is strictly prohibited. If you received this email message in error, please delete it and immediately notify the sender. This email message and any attachments have been scanned and are believed to be free of malicious software and defects that might affect any computer system in which they are received and opened. No responsibility is accepted by Centurion Research Solutions, LLC for any loss or damage arising from the content of this email.
Re: How can i find a document by a special id?
Am 20.07.2011 19:23, schrieb Kyle Lee: Is the mediacode always alphabetic, and is the ID always numeric? No sadly not. We expose our products on too many medias :-). Per
Re: Geospatial queries in Solr
So I've pulled the latest and can run the example, I've tried to move my config over and am having a bit of an issue when executing queries, specifically I get this: Unable to read: POLYGON((... looking at the code it's usign the simple spatial context, how do I specify JtsSpatialContext? On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the update David, I'll give that a try now. On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote: Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a mvn clean install and you'll be back in business. On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote: Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
Re: Wiki Error JSON syntax
I think I can trust you but this is weird. Funny things if you try to validate on http://jsonlint.com/ this JSON, duplicates keys are automatically removed. But the thing is, how can you possibly generate this json with Javascript Object? It will be really nice to combine both ways that you show on the page. Something like: { add: [ { doc: { id: DOC1, my_boosted_field: { boost: 2.3, value: test }, my_multivalued_field: [ aaa, bbb ] } }, { commitWithin: 5000, overwrite: false, boost: 3.45, doc: { f1: v2 } } ], commit: {}, optimize: { waitFlush: false, waitSearcher: false }, delete: [ { id: ID }, { query: QUERY } ] } Thanks you for you previous response Yonik. 2011/7/20 Yonik Seeley yo...@lucidimagination.com On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou remyloubra...@gmail.com wrote: Hi, I was writing a Solr Client API for Node and I found an error on this page http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands the JSON is not valid because there are duplicate keys and two times with add and delete. It's a common misconception that it's invalid JSON. Duplicate keys are in fact legal. -Yonik http://www.lucidimagination.com I tried with an array and it doesn't work as well, I got error 400, I think that's because the syntax is bad. I don't really know if I am at the good place to talk about that but ... that the only place I found. Sorry if it's not. Thanks, And I love Solr :)
Re: Geospatial queries in Solr
You can set the system property SpatialContextProvider to com.googlecode.lucene.spatial.base.context.JtsSpatialContext ~ David On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote: So I've pulled the latest and can run the example, I've tried to move my config over and am having a bit of an issue when executing queries, specifically I get this: Unable to read: POLYGON((... looking at the code it's usign the simple spatial context, how do I specify JtsSpatialContext? On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the update David, I'll give that a try now. On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote: Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a mvn clean install and you'll be back in business. On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote: Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
Re: Geospatial queries in Solr
Where do you set that? On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W. dsmi...@mitre.org wrote: You can set the system property SpatialContextProvider to com.googlecode.lucene.spatial.base.context.JtsSpatialContext ~ David On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote: So I've pulled the latest and can run the example, I've tried to move my config over and am having a bit of an issue when executing queries, specifically I get this: Unable to read: POLYGON((... looking at the code it's usign the simple spatial context, how do I specify JtsSpatialContext? On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the update David, I'll give that a try now. On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote: Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a mvn clean install and you'll be back in business. On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote: Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
Culr Tika not working with blanks into literal.field
Hi. I'm trying to index binary documents with curl and Tika for extracting text. The problem is that when I set the value of a field with spaces blanks using the input parameter: literal.fieldname=value, the document is not indexed. The sentence I send is the follow: curl http://localhost:8983/solr/update/extract?literal.id=doc1\literal.url=/mnt/windows/Ofertas/2006 Portal Intranet/DOCUMENTACION/datos.doc\uprefix=attr_\fmap.content=text\commit=true -F myfile=\@/mnt/windows/Ofertas/DOCUMENTACION/datos.doc That is literal.url=value with blanks apparently is not working
Re: Geospatial queries in Solr
The notion of a system property is a java concept; google it and you'll learn more. BTW, despite my responsiveness in helping right now; I'm pretty busy this week so this won't necessarily last long. ~ David On Jul 20, 2011, at 2:43 PM, Jamie Johnson wrote: Where do you set that? On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W. dsmi...@mitre.org wrote: You can set the system property SpatialContextProvider to com.googlecode.lucene.spatial.base.context.JtsSpatialContext ~ David On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote: So I've pulled the latest and can run the example, I've tried to move my config over and am having a bit of an issue when executing queries, specifically I get this: Unable to read: POLYGON((... looking at the code it's usign the simple spatial context, how do I specify JtsSpatialContext? On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the update David, I'll give that a try now. On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote: Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a mvn clean install and you'll be back in business. On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote: Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
RE: embeded solrj doesn't refresh index
Hi Thanks for response. Here is the whole picture: I use DIH to import and index data. And use embedded solrj connecting to the index file for search and other operations. Here is what I found: Once data are indexed (and committed), I can see the changes through solr web server, but not from embedded solrj. If I restart the embedded solr server, I do see the changes. Hope it helps. Thanks. -Original Message- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Wednesday, July 20, 2011 5:09 AM To: solr-user@lucene.apache.org Subject: Re: embeded solrj doesn't refresh index You should send a commit to you embedded solr Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/7/20 Jianbin Dai j...@huawei.com Hi, I am using embedded solrj. After I add new doc to the index, I can see the changes through solr web, but not from embedded solrj. But after I restart the embedded solrj, I do see the changes. It works as if there was a cache. Anyone knows the problem? Thanks. Jianbin
set queryNorm to 1?
Hi Folks, My boost function bf=div(product(num_clicks,0.3),sum(num_clicks,25)) I would like to directly add the score of it to the final scoring instead of letting it be normalized by the queryNorm value. Is there anyway to do it? Thanks. Elaine
Re: query time boosting in solr
Hi Tomas Yeah, I now understand it. I was confused about interpreting the output. Thanks for the comments. Sowmya. 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com So, what you want is to have the same exact results set as if the query was scientific, but the documents that also match Field1:[20 TO 30] to have more score, right? On Wed, Jul 20, 2011 at 10:53 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Tomas Here is what I was trying to give. http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2defType=dismaxq=scientificbq=Field1:[20%20TO%2030] ^10start=0rows=30qf=textfl=Field1,dociddebugQuery=on This query seems OK for that purpose. Over here, I was trying to change the range of Field1, keeping everything else intact. Here are my observations: 1) The number of results found remain intact. Only that the order of the results varies. Isn't this what was expected? 2) The boost factor (10) does not seem to throw any influence at all. It's on the parsed query, Why do you think it doesn't have an influence? Can you send the debug query output for a document that match the bt? I tried it with the Solr example and this is what I see: http://localhost:8983/solr/select?defType=dismaxq=displaybq=weight:[0%20TO%2010] ^10start=0rows=30debugQuery=onqf=features%20name This is the debug output for a document that match the query and the boost query: str name=MA147LL/A 1.137027 = (MATCH) sum of: 0.1994111 = (MATCH) max of: 0.1994111 = (MATCH) weight(features:display in 0), product of: 0.34767273 = queryWeight(features:display), product of: 3.7080503 = idf(docFreq=1, maxDocs=30) 0.0937616 = queryNorm 0.57355976 = (MATCH) fieldWeight(features:display in 0), product of: 1.4142135 = tf(termFreq(features:display)=2) 3.7080503 = idf(docFreq=1, maxDocs=30) 0.109375 = fieldNorm(field=features, doc=0) 0.937616 = (MATCH) ConstantScore(weight:[0.0 TO 10.0]^10.0)^10.0, product of: 10.0 = boost 0.0937616 = queryNorm /str and this is the debug output for a document that only match the main query: str name=VA902B 0.4834455 = (MATCH) sum of: 0.4834455 = (MATCH) max of: 0.4834455 = (MATCH) weight(name:display in 12), product of: 0.34767273 = queryWeight(name:display), product of: 3.7080503 = idf(docFreq=1, maxDocs=30) 0.0937616 = queryNorm 1.3905189 = (MATCH) fieldWeight(name:display in 12), product of: 1.0 = tf(termFreq(name:display)=1) 3.7080503 = idf(docFreq=1, maxDocs=30) 0.375 = fieldNorm(field=name, doc=12) /str Do you have something similar?? Here is what the debugQuery says: str name=parsedquery+DisjunctionMaxQuery((text:scientif)) () Field1:[20.0 TO 30.0]^10.0/str str name=parsedquery_toString+(text:scientif) () Field1:[20.0 TO 30.0]^10.0/str From these, it seems like its just filtering the results based on the Field1 values, rather than performing a Boost Query. S. 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Yes, it should, but make sure you specify at least the qf parameter for dismax. You can activate debugQuery and you'll see which documents get boosted and which aren't. On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Tomasso Thanks for a quick response. So, if I say: http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2* defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30 -will it be right? The above query: boosts the documents which suit the given query (scientific), which has Field1 values between 20-25, by a factor of 10 : Is that right?? S 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Sowmya, bq is a great way of boosting, but you have to be using the Dismax Query Parser or the Extended Dismax (edismax) query parser, it doesn't work with the Lucene Query Parser. If you can use any of those, then that's the solution. If you need to use the Lucene Query Parser, for a user query like: scientific temper you could create a query like: (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X being X the boost you want for those documents. with your query: scientific temper field1:[10 TO 2030] you are either adding the condition of the range value for the field (if your default operator is AND) or adding another way of matching the query (if your default operator ir OR, you can have documents in your result set that only matched the range query, and this is not what the user wanted). Hope this helps, Tomás On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com wrote: Can anyone throw some light on this
Schema Design/Data Import
[Apologies if this is a duplicate -- I have sent several messages from my work email and they just vanish, so I subscribed with my personal email] Greetings. I am struggling to design a schema and a data import/update strategy for some semi-complicated data. I would appreciate any input. What we have is a bunch of database records that may or may not have files attached. Sometimes no files, sometimes 50. The requirement is to index the database records AND the documents, and the search results would be just links to the database records. I'd love to crawl the site with Nutch and be done with it, but we have a complicated search form with various codes and attributes for the database records, so we need a detailed schema that will loosely correspond to boxes on the search form. I don't think we could easily do that if we just crawl the site. But with a detailed schema, I'm having trouble understanding how we could import and index from the database, and also index the related files, and have the same schema being populated, especially with the number of related documents being variable (maybe index them all to one field?). We have a lot of flexibility on how we can build this, so I'm open to any suggestions or pointers for further reading. I've spent a fair amount of time on the wiki but I didn't see anything that seemed directly relevant. An additional difficulty, that I am willing to overlook for the first cut, is that some of these files are zipped, and some of the zip files may contain other zip files, to maybe 3 or 4 levels deep. Help, please? cheers, Travis
Re: How can i find a document by a special id?
: Am 20.07.2011 19:23, schrieb Kyle Lee: : Is the mediacode always alphabetic, and is the ID always numeric? : : No sadly not. We expose our products on too many medias :-). If i'm understanding you correctly, you're saying even the prefix AB is not special, that there could be any number of prefixes identifying differnet mediacodes ? and the product ids aren't all numeric? your question seems absurd. I can only assume that I am horribly missunderstanding your situation. (which is very easy to do when you only have a single contrieved piece of example data to go on) As a general rule, it's not a good idea to think about Solr in the same way as a relational database, but Perhaps if you imagine for a moment that your Solr index *was* a (read only) relational database, with each solr field corrisponding to a column in your DB, and then you described in psuedo-code/sql how you would go about doing the types of id lookups you want to do, it might give us a better idea of your situation so we can suggest an approach for dealing with it. -Hoss
Re: Tokenizer Question
When the QueryParser gives hunks of text to an analyzer, and that analyzer produces multiple terms, the query parser has to decide how to build a query out of it. if the terms have identicle position information, then it always builds an OR query (this is the typical synonym situation). If the terms have differing positions, then the behavior is driven by the autoGeneratePhraseQueries attribute of the FieldType -- the default value of this depends on the version attribute of your top level schema/ tag. : I have a query which starts out with something like name:john, I : need to expand this to something like name:(john johnny). I've : implemented a custom tokenzier which gets close, but isn't quite right : it outputs name:john johnny. Is there a simple example of doing : what I'm attempting? : -Hoss
RE: defType argument weirdness
: I do understand what they do (at least well enough to use them), but I : find it confusing that it's called defType as a main param, but type : in a LocalParam, when to me they both seem to do the same thing -- which type as a localparam in a query string defines the type of query string it is -- picking hte parser. defType determins the default value for type in the primary query string. : (and then there's 'qt', often confused with defType/type by newbies, : since they guess it stands for 'query type', but which should probably : actually have been called 'requestHandler'/'rh' instead, since that's : what it actually chooses, no? It gets very confusing). : : If it's generally recognized it's confusing and perhaps a somewhat : inconsistent mental model being implied, I wonder if there'd be any : interest in renaming these to be more clear, leaving the old ones as : aliases/synonyms for backwards compatibility (perhaps with a long qt is historic and already being de-emphasized in favor of using path based names (ie: http://solr/handlername instead of http://solr/select?qt=/handlername) so adding yet another alias for that would be moving in the wrong direction. type and defType probably make more sense when you think of them in that order. I don't see a strong need to confuse/complicate the issue by adding more aliases for them. -Hoss
Re: defType argument weirdness
Huh, I'm still not completely following. I'm sure it makes sense if you understand the underlying implemetnation, but I don't understand how 'type' and 'defType' don't mean exactly the same thing, just need to be expressed differently in different location. Sorry for beating a dead horse, but maybe it would help if you could tell me what I'm getting wrong here: defType can only go in top-level param, and determines the query parser for the overall q top level param. type can only go in a LocalParam, and determines the query parser that applies to whatever query (top-level or nested) that the LocalParam syntax lives in. (Just as any other LocalParams apply only to the query that the LocalParam block lives in -- and nested queries inherit their query parser from the query they are nested in unless over-ridden, just as they inherit every other param from the query they are nested in unless over-ridden, nothing special here). Therefore for instance: defType=dismaxq=foo is equivalent to defType=luceneq={!type=dismax}foo Where am I straying in my mental model here? Because if all that is true, I don't understand how 'type' and 'defType' mean anything different -- they both choose the query parser, do they not? (which to me means I wish they were both called 'parser' instead of 'type' -- a 'type' here is the name of a query parser, is it not?) It's just that if it's in the top-level param you have to use 'defType', and if it's in a LocalParam you have to use 'type'. That's been my mental model, which has served me well so far, but if it's wrong and it's going to trip me up on some as yet unencountered use cases, it would probably be good for me to know it! (And probably good for some documentation to be written somewhere explaining it too). (And if they really are different, prefixing def to type is not making it very clear what the difference is! What's def supposed to stand for anyway?) Jonathan On 7/20/2011 3:49 PM, Chris Hostetter wrote: : I do understand what they do (at least well enough to use them), but I : find it confusing that it's called defType as a main param, but type : in a LocalParam, when to me they both seem to do the same thing -- which type as a localparam in a query string defines the type of query string it is -- picking hte parser. defType determins the default value for type in the primary query string. : (and then there's 'qt', often confused with defType/type by newbies, : since they guess it stands for 'query type', but which should probably : actually have been called 'requestHandler'/'rh' instead, since that's : what it actually chooses, no? It gets very confusing). : : If it's generally recognized it's confusing and perhaps a somewhat : inconsistent mental model being implied, I wonder if there'd be any : interest in renaming these to be more clear, leaving the old ones as : aliases/synonyms for backwards compatibility (perhaps with a long qt is historic and already being de-emphasized in favor of using path based names (ie: http://solr/handlername instead of http://solr/select?qt=/handlername) so adding yet another alias for that would be moving in the wrong direction. type and defType probably make more sense when you think of them in that order. I don't see a strong need to confuse/complicate the issue by adding more aliases for them. -Hoss
Re: Tokenizer Question
Thanks, I'll try that now, I'm assuming I need to add the position increment and offset attributes? On Wed, Jul 20, 2011 at 3:44 PM, Chris Hostetter hossman_luc...@fucit.org wrote: When the QueryParser gives hunks of text to an analyzer, and that analyzer produces multiple terms, the query parser has to decide how to build a query out of it. if the terms have identicle position information, then it always builds an OR query (this is the typical synonym situation). If the terms have differing positions, then the behavior is driven by the autoGeneratePhraseQueries attribute of the FieldType -- the default value of this depends on the version attribute of your top level schema/ tag. : I have a query which starts out with something like name:john, I : need to expand this to something like name:(john johnny). I've : implemented a custom tokenzier which gets close, but isn't quite right : it outputs name:john johnny. Is there a simple example of doing : what I'm attempting? : -Hoss
solrj and XML result sets
Does anyone have advice as to how to produce an XML result set using SolrJ?? My Java coder says he can *only* produce result sets in javabin - which is fine in most cases - but we have a need for an XML output stream as well. Thanks...
RE: Solr 3.3: Exception in thread Lucene Merge Thread #1
Says it is caused by a Java out of memory error, no? -Original Message- From: mdz-munich [mailto:sebastian.lu...@bsb-muenchen.de] Sent: Wednesday, July 20, 2011 9:18 AM To: solr-user@lucene.apache.org Subject: Re: Solr 3.3: Exception in thread Lucene Merge Thread #1 Here we go ... This time we tried to use the old LogByteSizeMergePolicy and SerialMergeScheduler: mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy/ mergeScheduler class=org.apache.lucene.index.SerialMergeScheduler/ We did this before, just to be sure ... ~300 Documents: / SEVERE: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirector y.java:264) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216) at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129) at org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreRead ers.java:244) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler. java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2714) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2709) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2705) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3509) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1850) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1814) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1778) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHand ler2.java:183) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2. java:416) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpd ateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:98) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:240) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:164) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator Base.java:462) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:4 03) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:30 1) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:162) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:140) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.j ava:309) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto r.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja va:919) at java.lang.Thread.run(Thread.java:736) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779) ... 44 more 20.07.2011 18:07:30 org.apache.solr.core.SolrCore execute INFO: [core.digi20] webapp=/solr path=/update params={} status=500 QTime=12302 20.07.2011 18:07:30 org.apache.solr.common.SolrException log SEVERE:
Re: How can i find a document by a special id?
Why not just search the 2 fields? q=*:*fq=mediacode:AB OR id:123456 You could take the user input and replace it: q=*:*fq=mediacode:$input OR id:$input Of course you can also use dismax and wrap with an OR. Bill Bell Sent from mobile On Jul 20, 2011, at 3:38 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Am 20.07.2011 19:23, schrieb Kyle Lee: : Is the mediacode always alphabetic, and is the ID always numeric? : : No sadly not. We expose our products on too many medias :-). If i'm understanding you correctly, you're saying even the prefix AB is not special, that there could be any number of prefixes identifying differnet mediacodes ? and the product ids aren't all numeric? your question seems absurd. I can only assume that I am horribly missunderstanding your situation. (which is very easy to do when you only have a single contrieved piece of example data to go on) As a general rule, it's not a good idea to think about Solr in the same way as a relational database, but Perhaps if you imagine for a moment that your Solr index *was* a (read only) relational database, with each solr field corrisponding to a column in your DB, and then you described in psuedo-code/sql how you would go about doing the types of id lookups you want to do, it might give us a better idea of your situation so we can suggest an approach for dealing with it. -Hoss
RE: embeded solrj doesn't refresh index
Hi Thanks for response. Here is the whole picture: I use DIH to import and index data. And use embedded solrj connecting to the index file for search and other operations. Here is what I found: Once data are indexed (and committed), I can see the changes through solr web server, but not from embedded solrj. If I restart the embedded solr server, I do see the changes. Hope it helps. Thanks. -Original Message- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Wednesday, July 20, 2011 5:09 AM To: solr-user@lucene.apache.org Subject: Re: embeded solrj doesn't refresh index You should send a commit to you embedded solr Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/7/20 Jianbin Dai j...@huawei.com Hi, I am using embedded solrj. After I add new doc to the index, I can see the changes through solr web, but not from embedded solrj. But after I restart the embedded solrj, I do see the changes. It works as if there was a cache. Anyone knows the problem? Thanks. Jianbin
Re: Data Import from a Queue
Yes this is a good reason for using a queue. I have used Amazon SQS this way and it was simple to set up. Bill Bell Sent from mobile On Jul 20, 2011, at 2:59 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Brandon, i don't know how they are using it in detail, but Part of Chef's Architecture is this one: Chef Server - RabbitMQ - Chef Solr Indexer - Solr http://wiki.opscode.com/download/attachments/7274878/chef-server-arch.png Perhaps not exactly, what you're looking for - but may give you an idea? Regards Stefan Am 19.07.2011 19:04, schrieb Brandon Fish: Let me provide some more details to the question: I was unable to find any example implementations where individual documents (single document per message) are read from a message queue (like ActiveMQ or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another method. Does anyone know of any available examples for this type of import? If no examples exist, what would be a recommended commit strategy for performance? My best guess for this would be to have a queue per core and commit once the queue is empty. Thanks. On Mon, Jul 18, 2011 at 6:52 PM, Erick Ericksonerickerick...@gmail.comwrote: This is a really cryptic problem statement. you might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fishbrandon.j.f...@gmail.com wrote: Does anyone know of any existing examples of importing data from a queue into Solr? Thank you.
Re: Geospatial queries in Solr
Thanks David. When trying to execute queries on a complex irregular polygon (say the shape of NJ) I'm getting results which are actually outside of that polygon. Is there a setting which controls this resolution? On Wed, Jul 20, 2011 at 2:53 PM, Smiley, David W. dsmi...@mitre.org wrote: The notion of a system property is a java concept; google it and you'll learn more. BTW, despite my responsiveness in helping right now; I'm pretty busy this week so this won't necessarily last long. ~ David On Jul 20, 2011, at 2:43 PM, Jamie Johnson wrote: Where do you set that? On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W. dsmi...@mitre.org wrote: You can set the system property SpatialContextProvider to com.googlecode.lucene.spatial.base.context.JtsSpatialContext ~ David On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote: So I've pulled the latest and can run the example, I've tried to move my config over and am having a bit of an issue when executing queries, specifically I get this: Unable to read: POLYGON((... looking at the code it's usign the simple spatial context, how do I specify JtsSpatialContext? On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the update David, I'll give that a try now. On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote: Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a mvn clean install and you'll be back in business. On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote: Thanks for responding so quickly, I don't mind waiting a bit. I'll hang out until the updates have been made. Thanks again. On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote: Hi Jamie. I work on LSP; it can index polygons and query for them. Although the capability is there, we have more testing benchmarking to do, and then we need to put together a tutorial to explain how to use it at the Solr layer. I recently cleaned up the READMEs a bit. Try downloading the trunk codebase, and follow the README. It points to another README which shows off a demo webapp. At the conclusion of this, you'll need to examine the tests and webapp a bit to figure out how to apply it in your app. We don't yet have a tutorial as the framework has been in flux although it has stabilized a good deal. Oh... by the way, this works off of Lucene/Solr trunk. Within the past week there was a major change to trunk and LSP won't compile until we make updates. Either Ryan McKinley or I will get to that by the end of the week. So unless you have access to 2-week old maven artifacts of Lucene/Solr, you're stuck right now. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote: I have looked at the code being shared on the lucene-spatial-playground and was wondering if anyone could provide some details as to its state. Specifically I'm looking to add geospatial support to my application based on a user provided polygon, is this currently possible using this extension?
Updating fields in an existing document
We find ourselves in the following quandry: At initial index time, we store a value in a field, and we use it for facetting. So it, seemingly, has to be there as a field. However, from time to time, something happens that causes us to want to change this value. As far as we know, this requires us to completely re-index the document, which is slow. It struck me that we can't be the only people to go down this road, so I write to inquire if we are missing something.
Re: Question on the appropriate software
Solr would work find for this, your PDF files would have to be interpreted by Tika, but see Data Import handler, FileListEntityProcessor and TikaEntityProcessor. I don't quite think Nutch is the tool here. You'll be wanting to do highlighting and a couple of other things You'll spend some time tweaking results to be what you want, but this is certainly do-able. Best Erick On Tue, Jul 19, 2011 at 1:29 PM, Matthew Twomey mtwo...@beakstar.com wrote: Greetings, I'm interesting in having a server based personal document library with a few specific features and I'm trying to determine what the most appropriate tools are to build it. I have the following content which I wish to include in the archive: 1. A smallish collection of technical books in PDF format (around 100) 2. Many years of several different magazine subscriptions in PDF format (probably another 100 - 200 PDFs) 3. Several years of personal documents which were scanned in and converted to searchable PDF format (300 - 500 documents) 4. I also have local mirrors of several HTML based reference sites I'd like to have the ability to index all of this content and search it from a web form (so that I and a few other can reach it from multiple locations). Here are two examples of the functionality I'm looking for: Scenario 1. What was that software that has all the nutritional data and hooks up to some USDA database? I know I read about it in one of my Linux Journals last year. Now I'd like to be able to pull up the webform and search for nutrition USDA. I'd like to restrict the search to the Linux Journal magazine PDFs (or refine the results). I'd like results to contain context snippets with each search result. Finally most importantly, I'd like multiple results per PDF (or all occurrences). The last one is important so that I can actually quickly find the right issue (in case there is some advertisement in every issue for the last year that contains those terms). When I click on the desired result, the PDF is downloaded by my browser. Scenario 2. How much have I been paying for property taxes for the last five years again? (the bills are all scanned in) In this case I'd like to search for my property identification number (which is on the bills) and the results should show all the documents that have it, with context. Clicking on results downloads the documents. I assume this example is simple to achieve if example 1 can be done. So in general, my question is - can this be done in a fairly straight forward manner with Solr? Is there a more appropriate tool to be using (e.g. Nutch?). Also, I have looked high and low for a free, already baked solution which can do scenario 1 but haven't been able to find something - so if someone knows of such a thing, please let me know. Thanks! -Matt
RE: Updating fields in an existing document
Nope, you're not missing anything, there's no way to alter a document in an index but reindexing the whole document. Solr's architecture would make it difficult (although never say impossible) to do otherwise. But you're right it would be convenient for people other than you. Reindexing a single document ought not to be slow, although if you have many of them at once it could be, or if you end up needing to very frequently commit to an index it can indeed cause problems. From: Benson Margulies [bimargul...@gmail.com] Sent: Wednesday, July 20, 2011 6:05 PM To: solr-user Subject: Updating fields in an existing document We find ourselves in the following quandry: At initial index time, we store a value in a field, and we use it for facetting. So it, seemingly, has to be there as a field. However, from time to time, something happens that causes us to want to change this value. As far as we know, this requires us to completely re-index the document, which is slow. It struck me that we can't be the only people to go down this road, so I write to inquire if we are missing something.
RE: Solr 3.3: Exception in thread Lucene Merge Thread #1
Yeah, indeed. But since the VM is equipped with plenty of RAM (22GB) and it works so far (Solr 3.2) very well with this setup, I AM slightly confused, am I? Maybe we should LOWER the dedicated Physical Memory? The remaining 10GB are used for a second tomcat (8GB) and the OS (Suse). As far as I understand NIO (mostly un-far), this package can directly use the most efficient operations of the underlying platform. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3186986.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr not returning results for some key words
Greetings, I'm having trouble getting Solr to return results for key words that I know for sure are in the index. As a test, I've indexed a PDF of a book on Java. I'm trying to search the index for UnsupportedOperationException but I get no results. I can see it in the index though: # [root@myhost apache-solr-1.4.1]# strings example/solr/data/index/_0.fdt|grep UnsupportedOperationException UnsupportedOperationException if the iterator returned by this collec- throw new UnsupportedOperationException(); UnsupportedOperationException Object does not support methodCHAPTER 9 EXCEPTIONS UnsupportedOperationException, 87, [root@myhost apache-solr-1.4.1]# # On the other hand, if I search the index for the word support (which is also contained in the grep above), I get a hit on this document. Furthermore, if I search on support and include highlighted snippets, I can see the word UnsupportedOperationException right in there in the highlight results! # of an object has been detected where it is prohibited UnsupportedOperationException Object does not emsupport/em # So why do I get no hits when I search for it? This happens with many different key words. Any thoughts on how I can trouble shoot this or ideas on why it's not working properly? Thanks, -Matt
Re: Manipulating a Fuzzy Query's Prefix Length
Update: Solr/Lucene 4.0 will incorporate a new fuzzy search algorithm with substantial performance improvements. To tide us over until this release, we've simply rebuilt from source with a default prefix length of 2, which will suit our needs until then. On Wed, Jul 20, 2011 at 10:09 AM, Kyle Lee randall.kyle@gmail.comwrote: We're performing fuzzy searches on a field possessing a large number of unique terms. Specifying a required minimum similarity of 0.7 results in a query execution time of 13-15 seconds, which stands in stark contrast to our average query time of 40ms. We suspect that the performance problem most likely emanates from the enumeration over all the unique terms in the index. The Lucene documentation for FuzzyQuery supports this theory with the following warning: *Warning:* this query is not very scalable with its default prefix length of 0 - in this case, *every* term will be enumerated and cause an edit score calculation. We would therefore like to set the prefix length to one or two, mandating that the first couple of characters match and thereby substantially reduce the number of terms enumerated. Is this possible with Solr? I haven't yet discovered a method, if so. Any help would be greatly appreciated.
Announcement/Invitation: Melbourne Solr/Lucene Users Group
Hi all, I hope you won't mind me informing the list, but I thought some Melbourne-based members would find this relevant. We have noticed that there is a blossoming of Apache Solr/Lucene usage development in Melbourne in addition to a lack of an unofficial, relaxed gathering to allow some fruitful information and experience exchange. We're trying to put together a laid back meet up for developers (and other interested people) who are currently using Apache Solr (and/or Lucene) or would like to learn more about it. Aiming for it to be a high signal/noise ratio group, with meet ups probably once every two months. The first meet up is still TBD, but please join the group if you're keen to join us for pizza, beer, and a discussion about Solr once we figure out the date of the first meeting. Also, please feel free to suggest quick (15 minute) presentations - whether it be a problem you've solved, a problem you need help solving or a general interesting experience of using Solr. We're keeping registrations here: http://www.meetup.com/melbourne-solr/ Feel free to pass to co-workers, colleagues who would be interested. Cheers, Tal
Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group
Hi Tal, On 21/07/11 14:04, Tal Rotbart wrote: We have noticed that there is a blossoming of Apache Solr/Lucene usage development in Melbourne in addition to a lack of an unofficial, relaxed gathering to allow some fruitful information and experience exchange. We're trying to put together a laid back meet up for developers (and other interested people) who are currently using Apache Solr (and/or Lucene) or would like to learn more about it. Aiming for it to be a high signal/noise ratio group, with meet ups probably once every two months. This sounds great! I'm not sure I'll be a regular, but if I'm around town when it is on I will try to drop in. The first meet up is still TBD, but please join the group if you're keen to join us for pizza, beer, and a discussion about Solr once we figure out the date of the first meeting. Once a date is decided please update the Melbourne *UG wiki page so others can find out about it. The wiki has meeting times for various user groups around town, which might help you find a time which doesn't clash with other groups. Check out at http://perl.net.au/wiki/Melbourne Cheers Dave
Re: Solr not returning results for some key words
Ok, apparently I'm not the first to have fallen prey to maxFieldLength gotcha: http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html All fixed now. -Matt On 07/20/2011 07:13 PM, Matthew Twomey wrote: Greetings, I'm having trouble getting Solr to return results for key words that I know for sure are in the index. As a test, I've indexed a PDF of a book on Java. I'm trying to search the index for UnsupportedOperationException but I get no results. I can see it in the index though: # [root@myhost apache-solr-1.4.1]# strings example/solr/data/index/_0.fdt|grep UnsupportedOperationException UnsupportedOperationException if the iterator returned by this collec- throw new UnsupportedOperationException(); UnsupportedOperationException Object does not support method CHAPTER 9 EXCEPTIONS UnsupportedOperationException, 87, [root@myhost apache-solr-1.4.1]# # On the other hand, if I search the index for the word support (which is also contained in the grep above), I get a hit on this document. Furthermore, if I search on support and include highlighted snippets, I can see the word UnsupportedOperationException right in there in the highlight results! # of an object has been detected where it is prohibited UnsupportedOperationException Object does not emsupport/em # So why do I get no hits when I search for it? This happens with many different key words. Any thoughts on how I can trouble shoot this or ideas on why it's not working properly? Thanks, -Matt
Re: Question on the appropriate software
Excellent, thanks for the confirmation Erik. I've started working with Solr (just getting my feet wet at this point). -Matt On 07/20/2011 05:38 PM, Erick Erickson wrote: Solr would work find for this, your PDF files would have to be interpreted by Tika, but see Data Import handler, FileListEntityProcessor and TikaEntityProcessor. I don't quite think Nutch is the tool here. You'll be wanting to do highlighting and a couple of other things You'll spend some time tweaking results to be what you want, but this is certainly do-able. Best Erick On Tue, Jul 19, 2011 at 1:29 PM, Matthew Twomeymtwo...@beakstar.com wrote: Greetings, I'm interesting in having a server based personal document library with a few specific features and I'm trying to determine what the most appropriate tools are to build it. I have the following content which I wish to include in the archive: 1. A smallish collection of technical books in PDF format (around 100) 2. Many years of several different magazine subscriptions in PDF format (probably another 100 - 200 PDFs) 3. Several years of personal documents which were scanned in and converted to searchable PDF format (300 - 500 documents) 4. I also have local mirrors of several HTML based reference sites I'd like to have the ability to index all of this content and search it from a web form (so that I and a few other can reach it from multiple locations). Here are two examples of the functionality I'm looking for: Scenario 1. What was that software that has all the nutritional data and hooks up to some USDA database? I know I read about it in one of my Linux Journals last year. Now I'd like to be able to pull up the webform and search for nutrition USDA. I'd like to restrict the search to the Linux Journal magazine PDFs (or refine the results). I'd like results to contain context snippets with each search result. Finally most importantly, I'd like multiple results per PDF (or all occurrences). The last one is important so that I can actually quickly find the right issue (in case there is some advertisement in every issue for the last year that contains those terms). When I click on the desired result, the PDF is downloaded by my browser. Scenario 2. How much have I been paying for property taxes for the last five years again? (the bills are all scanned in) In this case I'd like to search for my property identification number (which is on the bills) and the results should show all the documents that have it, with context. Clicking on results downloads the documents. I assume this example is simple to achieve if example 1 can be done. So in general, my question is - can this be done in a fairly straight forward manner with Solr? Is there a more appropriate tool to be using (e.g. Nutch?). Also, I have looked high and low for a free, already baked solution which can do scenario 1 but haven't been able to find something - so if someone knows of such a thing, please let me know. Thanks! -Matt
Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group
Sounds great :) I'll sign up as well. Look forward to a meeting! Mark On Thu, Jul 21, 2011 at 2:14 PM, Dave Hall dave.h...@skwashd.com wrote: Hi Tal, On 21/07/11 14:04, Tal Rotbart wrote: We have noticed that there is a blossoming of Apache Solr/Lucene usage development in Melbourne in addition to a lack of an unofficial, relaxed gathering to allow some fruitful information and experience exchange. We're trying to put together a laid back meet up for developers (and other interested people) who are currently using Apache Solr (and/or Lucene) or would like to learn more about it. Aiming for it to be a high signal/noise ratio group, with meet ups probably once every two months. This sounds great! I'm not sure I'll be a regular, but if I'm around town when it is on I will try to drop in. The first meet up is still TBD, but please join the group if you're keen to join us for pizza, beer, and a discussion about Solr once we figure out the date of the first meeting. Once a date is decided please update the Melbourne *UG wiki page so others can find out about it. The wiki has meeting times for various user groups around town, which might help you find a time which doesn't clash with other groups. Check out at http://perl.net.au/wiki/**Melbournehttp://perl.net.au/wiki/Melbourne Cheers Dave -- E: mark.man...@gmail.com T: http://www.twitter.com/neurotic W: www.compoundtheory.com cf.Objective(ANZ) + Flex - Nov 17, 18 - Melbourne Australia http://www.cfobjective.com.au
Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group
Hi, I m intrested to atained but not in aus.:-( Regards On 21-Jul-2011 9:45 AM, Dave Hall dave.h...@skwashd.com wrote: Hi Tal, On 21/07/11 14:04, Tal Rotbart wrote: We have noticed that there is a blossoming of Apache Solr/Lucene usage development in Melbourne in addition to a lack of an unofficial, relaxed gathering to allow some fruitful information and experience exchange. We're trying to put together a laid back meet up for developers (and other interested people) who are currently using Apache Solr (and/or Lucene) or would like to learn more about it. Aiming for it to be a high signal/noise ratio group, with meet ups probably once every two months. This sounds great! I'm not sure I'll be a regular, but if I'm around town when it is on I will try to drop in. The first meet up is still TBD, but please join the group if you're keen to join us for pizza, beer, and a discussion about Solr once we figure out the date of the first meeting. Once a date is decided please update the Melbourne *UG wiki page so others can find out about it. The wiki has meeting times for various user groups around town, which might help you find a time which doesn't clash with other groups. Check out at http://perl.net.au/wiki/Melbourne Cheers Dave