Advanced search with results matrix
Hi, First off, we're a happy user of Apache Solr v3.1 Enterprise search server, integrated and successfully running in our LIVE Production server. Now, we're enhancing our existing search feature in our web application as explained below, that truly helps application users in making informed decision before getting their search results: There will be 3 textboxes provided and users can enter keyword phrases with OR, AND combination within each textbox as shown below, for example: Textbox 1: SQL Server OR SQL Textbox 2: Visual Basic OR VB.NET Textbox 3: Java AND JavaScript If User clicks Search button, we want to present an intermediate or results matrix page that would generate all possible combinations for 3 textboxes with how many records found for each combination as given below (between combination it is AND operation). This, as I said before, truly helps application users in making informed decision/choice before getting their search results: +-+-+--- - Matches | Textbox 1 | Textbox 2 | Textbox 3 +-+-+--- - 200 |SQL Server OR SQL | | 300 | |Visual Basic OR VB.NET | 400 | | | Java AND JavaScript 250 |SQL Server OR SQL |Visual Basic OR VB.NET | 350 | |Visual Basic OR VB.NET | Java AND JavaScript 300 |SQL Server OR SQL | | Java AND JavaScript 100 |SQL Server OR SQL |Visual Basic OR VB.NET | Java AND JavaScript +-+-+--- - Only on clicking one of this Matches count will display actual results of that particular search. My questions are, 1) Do I need to run search separately for each combination or is it possible to combine and obtain results matrix page by making only one single call to Apache Solr? Or are they any plug-ins available that provides functionality close to my use case? 2) How do I instruct Solr to return only count (not result) for the search performed? 3) Any ideas/suggestions/approaches/resources are really appreciated and welcomed Regards, Gnanam
Re: Advanced search with results matrix
Hey Gnanam, 1. If I understand correctly you just need to perform one query. Like so (translated to propper syntax of course): (SQL Server OR SQL) OR (Visual Basic OR VB.NET) OR (Java AND JavaScript) 2. Every query you perform with Solr returns the 'results' count, if you ONLY want the results count simply set rows to 0 (but im guessing you will want both the results and the count as to avoid 2 trips). - The 'results count' is here: result name=response numFound=0 start=0/ (being numFound) David On 4/05/2012 4:46 PM, Gnanakumar wrote: Hi, First off, we're a happy user of Apache Solr v3.1 Enterprise search server, integrated and successfully running in our LIVE Production server. Now, we're enhancing our existing search feature in our web application as explained below, that truly helps application users in making informed decision before getting their search results: There will be 3 textboxes provided and users can enter keyword phrases with OR, AND combination within each textbox as shown below, for example: Textbox 1: SQL Server OR SQL Textbox 2: Visual Basic OR VB.NET Textbox 3: Java AND JavaScript If User clicks Search button, we want to present an intermediate or results matrix page that would generate all possible combinations for 3 textboxes with how many records found for each combination as given below (between combination it is AND operation). This, as I said before, truly helps application users in making informed decision/choice before getting their search results: +-+-+--- - Matches | Textbox 1 | Textbox 2 | Textbox 3 +-+-+--- - 200 |SQL Server OR SQL | | 300 | |Visual Basic OR VB.NET | 400 | | | Java AND JavaScript 250 |SQL Server OR SQL |Visual Basic OR VB.NET | 350 | |Visual Basic OR VB.NET | Java AND JavaScript 300 |SQL Server OR SQL | | Java AND JavaScript 100 |SQL Server OR SQL |Visual Basic OR VB.NET | Java AND JavaScript +-+-+--- - Only on clicking one of this Matches count will display actual results of that particular search. My questions are, 1) Do I need to run search separately for each combination or is it possible to combine and obtain results matrix page by making only one single call to Apache Solr? Or are they any plug-ins available that provides functionality close to my use case? 2) How do I instruct Solr to return only count (not result) for the search performed? 3) Any ideas/suggestions/approaches/resources are really appreciated and welcomed Regards, Gnanam
Re: SOLR 3.5 Index Optimization not producing single .cfs file
Thanx Mike, If you really must have a CFS (how come?) then you can call TieredMergePolicy.setNOCFSRatio(1.0) -- not sure how/where this is exposed in Solr though. BTW, would this impact the search performance? I mean i was just trying few random keyword searches(without sort and filters) on both the system(1.4.1 vs 3.5) and found that 3.5 searches takes longer time than the 1.4.1(around 10-20% slower). Haven't done any load test till now Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619p3961441.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Parent-Child relationship
Hi, As per my understanding the join is confined to a single core only and it is not possible to have joins between docs of different cores. Am I correct here? If yes, is there a possibility of having joins across cores anytime soon? -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259p3961509.html Sent from the Solr - User mailing list archive at Nabble.com.
search case: Elision and truncate in french
Hi all, I have a little problem, I don't find an easy configuration solution but maybe my google search is wrong :) - ElisionFilterFactory is enabled for searching and indexing analyzer. - Index contains: *l'aventure* = when I search *l'avent** solr finds nothing I would have a solution which doesn't look sexy: having another index with a patternreplacecharfilterfactory wich removes all ' in strings. Some tips would be usefull. Thanks, Claire;
RE: Advanced search with results matrix
1. If I understand correctly you just need to perform one query. Like so (translated to propper syntax of course): (SQL Server OR SQL) OR (Visual Basic OR VB.NET) OR (Java AND JavaScript) No, it's not just one single query, rather, as I've mentioned before, it's combination of searches with result count for each combination. Explained in detail below: 1) (SQL Server OR SQL) 2) (Visual Basic OR VB.NET) 3) (Java AND JavaScript) 4) (SQL Server OR SQL) AND (Visual Basic OR VB.NET) 5) (Visual Basic OR VB.NET) AND (Java AND JavaScript) 6) (SQL Server OR SQL) AND (Java AND JavaScript) 7) (SQL Server OR SQL) AND (Visual Basic OR VB.NET) AND (Java AND JavaScript) Hope I made it clear.
Re: Advanced search with results matrix
Hi, have you considered to junk your subqueries into disjunction (BooleanQuery.Occurs.SHOULD) and request http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting? On Fri, May 4, 2012 at 1:32 PM, Gnanakumar gna...@zoniac.com wrote: 1. If I understand correctly you just need to perform one query. Like so (translated to propper syntax of course): (SQL Server OR SQL) OR (Visual Basic OR VB.NET) OR (Java AND JavaScript) No, it's not just one single query, rather, as I've mentioned before, it's combination of searches with result count for each combination. Explained in detail below: 1) (SQL Server OR SQL) 2) (Visual Basic OR VB.NET) 3) (Java AND JavaScript) 4) (SQL Server OR SQL) AND (Visual Basic OR VB.NET) 5) (Visual Basic OR VB.NET) AND (Java AND JavaScript) 6) (SQL Server OR SQL) AND (Java AND JavaScript) 7) (SQL Server OR SQL) AND (Visual Basic OR VB.NET) AND (Java AND JavaScript) Hope I made it clear. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
problem with date searching.
Hi I'm having a slight problem with date searching... if i give same date range in search query it seems to be working fine when try to give the different date range and i am not getting result. Ex : select/?defType=dismaxq=[*2012-02-02T01:30:52Z TO 2012-02-02T01:30:52Z*]qf=scanneddate i am getting result result name=response numFound=20 start=0 if try different date range . [2012-02-02T01:30:52Z TO 2011-09-22T22:40:30Z] there is no record at all .please help me the same. -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with date searching.
thanks for quick response. I tried your advice . [2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] like that even though i am not getting any result . -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with date searching.
unless, something else is wrong, my question would be, if you have the documents in solr stamped with these dates? also could try for a test specifying the field name directly: q=scanneddate:[2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] also, in your first e-mail you said you have used [*2012-02-02T01:30:52Z TO 2012-02-02T01:30:52Z*] with asterisks *, what scanneddate values did you then get? On Fri, May 4, 2012 at 1:37 PM, ayyappan ayyaba...@gmail.com wrote: thanks for quick response. I tried your advice . [2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] like that even though i am not getting any result . -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan
Re: Faceting on a date field multiple times
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html Sent from the Solr - User mailing list archive at Nabble.com.
Word recognised in a search
Hi. I'm making some searches using Apache SOLR 1.4, but I will upgrade to 3.6. When SOLR uses stemming, it is very difficult to know what are the words that are really found (for example, if I search ups SOLR find up too). I need to know that because I need to highlight founded words in the text, and I need to extract some strings from the source using that words. I hope I managed in explain my problem well :-) Could you help me, please? Thank you very much! Bye.
Re: Faceting on a date field multiple times
Thanks Marc. On May 4, 2012, at 8:52 PM, Marc Sturlese wrote: http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Parent-Child relationship
See: https://issues.apache.org/jira/browse/LUCENE-3759 No time-frame mentioned though. Best Erick On Fri, May 4, 2012 at 4:20 AM, tamanjit.bin...@yahoo.co.in tamanjit.bin...@yahoo.co.in wrote: Hi, As per my understanding the join is confined to a single core only and it is not possible to have joins between docs of different cores. Am I correct here? If yes, is there a possibility of having joins across cores anytime soon? -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259p3961509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Word recognised in a search
have you tried HighlightComponent? hl=truehl.field=orig_text_field - Dmitry On Fri, May 4, 2012 at 1:52 PM, mattia.martine...@gmail.com mattia.martine...@gmail.com wrote: Hi. I'm making some searches using Apache SOLR 1.4, but I will upgrade to 3.6. When SOLR uses stemming, it is very difficult to know what are the words that are really found (for example, if I search ups SOLR find up too). I need to know that because I need to highlight founded words in the text, and I need to extract some strings from the source using that words. I hope I managed in explain my problem well :-) Could you help me, please? Thank you very much! Bye. -- Regards, Dmitry Kan
Re: get latest 50 documents the fastest way
You can do this with Solr 4.0 with RankingAlgorithm 1.4.2. Please pass the below parameters to your search: age=latestdocs=50 For eg: http://localhost:8983/solr/select/?q=*:*age=latestdocs=50 This would inspect the latest last 50 documents in real time and returns results accordingly. Using *:* will not affect the performance and you will not need any additional ranking or sort, etc. Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 5/1/2012 7:38 AM, Yuval Dotan wrote: Hi Guys We have a use case where we need to get the 50 *latest *documents that match my query - without additional ranking,sorting,etc on the results. My index contains 1,000,000,000 documents and i noticed that if the number of found documents is very big (larger than 50% of the index size - 500,000,000 docs) than it takes more than 5 seconds to get the results even with rows=50 parameter. Is there a way to get the results faster? Thanks Yuval
Re: search case: Elision and truncate in french
Unfortunately, use of a wildcard causes the normal token analysis processing to be completely bypassed, including the elision filter. So, when using a wildcard you have to simulate in your head all of the analysis features, such as manually performing the elision. -- Jack Krupansky -Original Message- From: Claire Hernandez Sent: Friday, May 04, 2012 5:08 AM To: solr-user@lucene.apache.org Cc: Jonathan Druart Subject: search case: Elision and truncate in french Hi all, I have a little problem, I don't find an easy configuration solution but maybe my google search is wrong :) - ElisionFilterFactory is enabled for searching and indexing analyzer. - Index contains: *l'aventure* = when I search *l'avent** solr finds nothing I would have a solution which doesn't look sexy: having another index with a patternreplacecharfilterfactory wich removes all ' in strings. Some tips would be usefull. Thanks, Claire;
Re: search case: Elision and truncate in french
Jack - that was true, until Solr 3.6+: http://wiki.apache.org/solr/MultitermQueryAnalysis So, Claire, it's possible with the latest Solr release, to do this using bits and pieces of your existing analysis chain. As Jack said, though, this is a manual chore in pre-Solr-3.6 releases. Erik On May 4, 2012, at 08:54 , Jack Krupansky wrote: Unfortunately, use of a wildcard causes the normal token analysis processing to be completely bypassed, including the elision filter. So, when using a wildcard you have to simulate in your head all of the analysis features, such as manually performing the elision. -- Jack Krupansky -Original Message- From: Claire Hernandez Sent: Friday, May 04, 2012 5:08 AM To: solr-user@lucene.apache.org Cc: Jonathan Druart Subject: search case: Elision and truncate in french Hi all, I have a little problem, I don't find an easy configuration solution but maybe my google search is wrong :) - ElisionFilterFactory is enabled for searching and indexing analyzer. - Index contains: *l'aventure* = when I search *l'avent** solr finds nothing I would have a solution which doesn't look sexy: having another index with a patternreplacecharfilterfactory wich removes all ' in strings. Some tips would be usefull. Thanks, Claire;
Why would solr norms come up different from Lucene norms?
So, I've got some code that stores the same documents in a Lucene 3.5.0 index and a Solr 3.5.0 instance. It's only five documents. For a particular field, the Solr norm is always 0.625, while the Lucene norm is .5. I've watched the code in NormsWriterPerField in both cases. In Solr we've got .577, in naked Lucene it's .5. I tried to check for boosts, and I don't see any non-1.0 document or field boosts. The Solr field is: field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true stored=true multiValued=false /
Single Index to Shards
If I have a single Solr index running on a Core, can I split it or migrate it into 2 shards? -- View this message in context: http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search case: Elision and truncate in french
Okay, the issue is that only *some* of the filters are multi-term aware and the elision filter is one that is NOT multi-term aware. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Friday, May 04, 2012 9:42 AM To: solr-user@lucene.apache.org Subject: Re: search case: Elision and truncate in french Well, if it was fixed, then it is now broken again - in the 3.6 release! Here’s a snippet from debugQuery showing that the generated query has the elision intact in the analyzed term: str name=rawquerystringtext_fr:l'avion*/str str name=querystringtext_fr:l'avion*/str str name=parsedquery+text_fr:l'avion*/str str name=parsedquery_toString+text_fr:l'avion*/str And for the same term without wildcard: str name=rawquerystringtext_fr:l'avion/str str name=querystringtext_fr:l'avion/str str name=parsedquery+text_fr:avion/str str name=parsedquery_toString+text_fr:avion/str -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Friday, May 04, 2012 9:06 AM To: solr-user@lucene.apache.org Subject: Re: search case: Elision and truncate in french Jack - that was true, until Solr 3.6+: http://wiki.apache.org/solr/MultitermQueryAnalysis So, Claire, it's possible with the latest Solr release, to do this using bits and pieces of your existing analysis chain. As Jack said, though, this is a manual chore in pre-Solr-3.6 releases. Erik On May 4, 2012, at 08:54 , Jack Krupansky wrote: Unfortunately, use of a wildcard causes the normal token analysis processing to be completely bypassed, including the elision filter. So, when using a wildcard you have to simulate in your head all of the analysis features, such as manually performing the elision. -- Jack Krupansky -Original Message- From: Claire Hernandez Sent: Friday, May 04, 2012 5:08 AM To: solr-user@lucene.apache.org Cc: Jonathan Druart Subject: search case: Elision and truncate in french Hi all, I have a little problem, I don't find an easy configuration solution but maybe my google search is wrong :) - ElisionFilterFactory is enabled for searching and indexing analyzer. - Index contains: *l'aventure* = when I search *l'avent** solr finds nothing I would have a solution which doesn't look sexy: having another index with a patternreplacecharfilterfactory wich removes all ' in strings. Some tips would be usefull. Thanks, Claire;
RE: Single Index to Shards
Yes you can split your index into multiple shards More info on shards can be found here : http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding Thanks. Regards, Nitin Keswani -Original Message- From: michaelsever [mailto:sever_mich...@bah.com] Sent: Friday, May 04, 2012 9:44 AM To: solr-user@lucene.apache.org Subject: Single Index to Shards If I have a single Solr index running on a Core, can I split it or migrate it into 2 shards? -- View this message in context: http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.html Sent from the Solr - User mailing list archive at Nabble.com.
Documents With large number of fields
Hi, My data model consist of different types of data. Each data type has its own characteristics If I include the unique characteristics of each type of data, my single Solr Document could end up containing 300-400 fields. In order to drill down to this data set I would have to provide faceting on most of these fields so that I can drilldown to very small set of Documents. Here are some of the questions : 1) What's the best approach when dealing with documents with large number of fields . Should I keep a single document with large number of fields or split my document into a number of smaller documents where each document would consist of some fields 2) From an operational point of view, what's the drawback of having a single document with a very large number of fields. Can Solr support documents with large number of fields (say 300 to 400). Thanks. Regards, Nitin Keswani
Re: problem with date searching.
Right, you need to do the explicit qualification of the date field. dismax parsing is intended to work with text-type fields, not numeric or date fields. If you attach debugQuery=on, you'll see that your scanneddate field is just dropped. Furthermore, dismax was never intended to work with range queries. Note this from the DisMaxQParserPlugin page: extremely simplified subset of the Lucene QueryParser syntax I'll expand on this a bit on the Wiki page. Best Erick On Fri, May 4, 2012 at 6:45 AM, Dmitry Kan dmitry@gmail.com wrote: unless, something else is wrong, my question would be, if you have the documents in solr stamped with these dates? also could try for a test specifying the field name directly: q=scanneddate:[2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] also, in your first e-mail you said you have used [*2012-02-02T01:30:52Z TO 2012-02-02T01:30:52Z*] with asterisks *, what scanneddate values did you then get? On Fri, May 4, 2012 at 1:37 PM, ayyappan ayyaba...@gmail.com wrote: thanks for quick response. I tried your advice . [2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] like that even though i am not getting any result . -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan
Re: Single Index to Shards
There's no way to split an _existing_ index into multiple shards, although some of the work on SolrCloud is considering being able to do this. You have a couple of choices here: 1 Just reindex everything from scratch into two shards 2 delete all the docs from your index that will go into shard 2 and just index the docs for shard 2 in your new shard But I want to be sure you're on the right track here. You only need to shard if your index contains too many documents for your hardware to produce decent query rates. If you are getting (and I'm picking this number out of thin air) 50 QPS on your hardware (i.e. you're not stressing memory etc) and just want to get to 150 QPS, use replication rather than sharding. see: http://wiki.apache.org/solr/SolrReplication Best Erick On Fri, May 4, 2012 at 9:44 AM, michaelsever sever_mich...@bah.com wrote: If I have a single Solr index running on a Core, can I split it or migrate it into 2 shards? -- View this message in context: http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.html Sent from the Solr - User mailing list archive at Nabble.com.
query keyword-tokenized fields with solrj
Hi :) In schema.xml I added a custom fieldType called keyword: fieldType name=keyword class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType and a field called article : field name=article type=keyword indexed=true stored=true/ Now I would like to query this field using solrj. I'm using the following code: SolrQuery query = new SolrQuery(article:L. 111-5-2); QueryResponse rsp = server.query(query); list = rsp.getResults(); Even though there is only one entry in my index with the value L. 111-5-2 in the field article I get a lot of results because the article value is not kept as a single token. I could change my string as article:\\L. 111-5-2\\ but I was wondering if there could be any prettier way to do that (programmatically with the solrj api) ? Gary
Re: Documents With large number of fields
I'm also interested in this. Same situation. On Fri, 2012-05-04 at 10:27 -0400, Keswani, Nitin - BLS CTR wrote: Hi, My data model consist of different types of data. Each data type has its own characteristics If I include the unique characteristics of each type of data, my single Solr Document could end up containing 300-400 fields. In order to drill down to this data set I would have to provide faceting on most of these fields so that I can drilldown to very small set of Documents. Here are some of the questions : 1) What's the best approach when dealing with documents with large number of fields . Should I keep a single document with large number of fields or split my document into a number of smaller documents where each document would consist of some fields 2) From an operational point of view, what's the drawback of having a single document with a very large number of fields. Can Solr support documents with large number of fields (say 300 to 400). Thanks. Regards, Nitin Keswani
Re: how to present html content in browse
Hello, I'm having a hard time understanding this, and I had this same question. When using DIH should the HTML field be stored in the raw HTML string field or the stripped field? Also what source field(s) need to be copied and to what destination? Thanks On Thu, May 3, 2012 at 10:15 PM, Lance Norskog goks...@gmail.com wrote: Make two fields, one with stores the stripped HTML and another that stores the parsed HTML. You can use copyField so that you do not have to submit the html page twice. You would mark the stripped field 'indexed=true stored=false' and the full text field the other way around. The full text field should be a String type. On Thu, May 3, 2012 at 1:04 PM, srini softtec...@gmail.com wrote: I am indexing records from database using DIH. The content of my record is in html format. When I use browse I would like to show the content in html format, not in text format. Any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: 1MB file to Zookeeper
On Fri, May 4, 2012 at 12:50 PM, Mark Miller markrmil...@gmail.com wrote: And how should we detect if data is compressed when reading from ZooKeeper? I was thinking we could somehow use file extensions? eg synonyms.txt.gzip - then you can use different compression algs depending on the ext, etc. We would want to try and make it as transparent as possible though... At first I thought about adding a marker to the beginning of a file, but file extensions could work too, as long as the resource loader made it transparent (i.e. code would just need to ask for synonyms.txt, but the resource loader would search for synonyms.txt.gzip, etc, if the original name was not found) Hmmm, but this breaks down for things like watches - I guess that's where putting the encoding inside the file would be a better option. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Faceting on a date field multiple times
Hi Ian, I believe you may be able to use a bunch of facet.query parameters, something like this: facet.query=yourfield:[NOW-1DAY TO NOW] facet.query=yourfield:[NOW-2DAY to NOW-1DAY] ... and so on. -sujit On May 3, 2012, at 10:41 PM, Ian Holsman wrote: Hi. I would like to be able to do a facet on a date field, but with different ranges (in a single query). for example. I would like to show #documents by day for the last week - #documents by week for the last couple of months #documents by year for the last several years. is there a way to do this without hitting solr 3 times? thanks Ian
Re: query keyword-tokenized fields with solrj
You have an embedded space in your keyword value, which must be escaped, somehow. So, the actual query can be written as article:L. 111-5-2 or article:L.\ 111-5-2 The later is slightly prettier, I suppose. I suppose you could use a wildcard: article:L.*111-5-2 article:L.?111-5-2 If you want to make it uglier, that would be easy: article:L.\u0020111-5-2 -- Jack Krupansky -Original Message- From: G.Long Sent: Friday, May 04, 2012 11:48 AM To: solr-user@lucene.apache.org Subject: query keyword-tokenized fields with solrj Hi :) In schema.xml I added a custom fieldType called keyword: fieldType name=keyword class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType and a field called article : field name=article type=keyword indexed=true stored=true/ Now I would like to query this field using solrj. I'm using the following code: SolrQuery query = new SolrQuery(article:L. 111-5-2); QueryResponse rsp = server.query(query); list = rsp.getResults(); Even though there is only one entry in my index with the value L. 111-5-2 in the field article I get a lot of results because the article value is not kept as a single token. I could change my string as article:\\L. 111-5-2\\ but I was wondering if there could be any prettier way to do that (programmatically with the solrj api) ? Gary
Template in a database field does not work. Please Help
I specified template in a field field column=incident_id name=object_id template=inc-${incident.incident_id} / When doing full import, for each row retrieved from oracle, there is this output in the console: May 03, 2012 3:47:08 PM org.apache.solr.handler.dataimport.TemplateTransformer transformRow WARNING: Unable to resolve variable: incident.incident_id while parsing expression: inc-${incident.incident_id} Below is the data-config.xml file where the template is defined: dataConfig dataSource name=jdbc driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//dbtest:1521/ORCL user=user password=xxx/ document entity name=incident transformer=TemplateTransformer query=select incident_id, ('inc-' || incident_id ) unique_id, long_desc from incident deltaQuery=select incident_id from incident where last_update gt; TO_DATE('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS') field column=incident_id name=incident_id/ field column=incident_id name=object_id template=inc-${incident.incident_id} / field column=unique_id name=unique_id / field column=long_desc name=long_desc / /entity /document /dataConfig Have tried to change the template to template=inc-${incident_id} Still no luck, similar error. Don't know what the TemplateTransformer is looking for to match the variable. Thanks, RTI QA
elevate vs. select numFound results
I need help understanding the difference in the numFound number in the result when I execute two queries against my solr instance, one with the elevation and one without. I have a simple elevate.xml file created and working and am searching for terms that are not meant to be elevated. Elevate query example.com:8080/solr/elevate?q=dwayne+rock+johnsonwt=xmlsort=score+descrows=1 for this the numFound is 125 in the result element of the XML Select query example.com:8080/solr/select?q=dwayne+rock+johnsonwt=xmlsort=score+descrows=1 for this the numFound is 154 in the result element of the XML For many (most all) of my queries the numFound results are the same (both with elevated query strings and with strings not in elevate.xml), but this one is very different. Should they be the same? Any idea what could make them different? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/elevate-vs-select-numFound-results-tp3963200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet and totaltermfreq
I have tried (as a test) combining facets and term vectors ( http://wiki.apache.org/solr/TermVectorComponent ) in one query and was able to get a list of facets and for each facet there was a term freq under termVectors section. Not sure, if that's what you are trying to achieve. -Dmitry On Fri, May 4, 2012 at 8:37 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible when faceting to return not only the strings but also the total term frequency for those facets? I am trying to avoid building a customized faceting component and making multiple queries. In our scenario we have multivalued fields which may have duplicates and I would like to be able to get a count of how many documents that term appears (currently what faceting does) but also how many times that term appears in general. -- Regards, Dmitry Kan
Re: Invalid version expected 2, but 60 on CentOS
On May 4, 2012, at 4:09 PM, Ravi Solr wrote: Thanking you in anticipation, Generally this happens because the webapp server is returning an html error response of some kind. Often it's a 404. I think in trunk this might have been addressed - that is, it's easier to see the true error. Not positive though. Some non success html response is likely coming back though. - Mark Miller lucidimagination.com
Re: how to present html content in browse
Evidently there was a problem with highlighting of HTML that is supposedly fixed in Solr 3.6 and trunk: https://issues.apache.org/jira/browse/SOLR-42 -- Jack Krupansky -Original Message- From: okayndc Sent: Friday, May 04, 2012 4:35 PM To: solr-user@lucene.apache.org Subject: Re: how to present html content in browse Is it possible to return the HTML field highlighted? On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky j...@basetechnology.comwrote: 1. The raw html field (call it, text_html) would be a string type field that is stored but not indexed. This is the field you direct DIH to output to. This is the field you would return in your search results with the HTML to be displayed. 2. The stripped field (call it, text_stripped) would be a text type field (where text is a field type you add that uses the HTML strip char filter as shown below) that is not stored but is indexed. Add a CopyField to your schema that copies from the raw html field to the stripped field (say, text_html to text_stripped.) For reference on HTML strip (HTMLStripCharFilterFactory), see: http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**shttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Which has: fieldtype name=text class=solr.TextField analyzer charFilter class=solr.**HTMLStripCharFilterFactory/ charFilter class=solr.**MappingCharFilterFactory mapping=mapping-** ISOLatin1Accent.txt/ tokenizer class=solr.**StandardTokenizerFactory/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.StopFilterFactory**/ filter class=solr.**PorterStemFilterFactory/ /analyzer /fieldtype Although, you might want to call that field type text_stripped to avoid confusion with a simple text field You can add HTMLStripCharFilterFactory to some other field type that you might want to use, but this charFilter needs to be before the tokenizer. The text field type above is just an example. -- Jack Krupansky -Original Message- From: okayndc Sent: Friday, May 04, 2012 1:01 PM To: solr-user@lucene.apache.org Subject: Re: how to present html content in browse Hello, I'm having a hard time understanding this, and I had this same question. When using DIH should the HTML field be stored in the raw HTML string field or the stripped field? Also what source field(s) need to be copied and to what destination? Thanks On Thu, May 3, 2012 at 10:15 PM, Lance Norskog goks...@gmail.com wrote: Make two fields, one with stores the stripped HTML and another that stores the parsed HTML. You can use copyField so that you do not have to submit the html page twice. You would mark the stripped field 'indexed=true stored=false' and the full text field the other way around. The full text field should be a String type. On Thu, May 3, 2012 at 1:04 PM, srini softtec...@gmail.com wrote: I am indexing records from database using DIH. The content of my record is in html format. When I use browse I would like to show the content in html format, not in text format. Any ideas? -- View this message in context: http://lucene.472066.n3.**nabble.com/how-to-present-** html-content-in-browse-**tp3960327.htmlhttp://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: elevate vs. select numFound results
Some ways that fewer docs might be returned by query elevation: 1. The excude option: exclude=true in the xml file. 2. The exclusive request parameter: exclusive=true in the URL. (Certainly not your case.) 3. The exclusive request parameter default set to true in defaults for the /elevate request handler in solrconfig. 4. Some other query-related parameters (e.g., qf) are different between your /select and /elevate request handlers Try adding enableElevation=false to your URL for /elevate, which should show you whether query elevation itself is affecting the number of docs, or if it must be some other parameters that are different between the two request handlers. -- Jack Krupansky -Original Message- From: roxy.noord...@wwecorp.com Sent: Friday, May 04, 2012 3:21 PM To: solr-user@lucene.apache.org Subject: elevate vs. select numFound results I need help understanding the difference in the numFound number in the result when I execute two queries against my solr instance, one with the elevation and one without. I have a simple elevate.xml file created and working and am searching for terms that are not meant to be elevated. Elevate query example.com:8080/solr/elevate?q=dwayne+rock+johnsonwt=xmlsort=score+descrows=1 for this the numFound is 125 in the result element of the XML Select query example.com:8080/solr/select?q=dwayne+rock+johnsonwt=xmlsort=score+descrows=1 for this the numFound is 154 in the result element of the XML For many (most all) of my queries the numFound results are the same (both with elevated query strings and with strings not in elevate.xml), but this one is very different. Should they be the same? Any idea what could make them different? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/elevate-vs-select-numFound-results-tp3963200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Single Index to Shards
If you are not using SolrCloud, splitting an index is simple: 1) copy the index 2) remove what you do not want via delete-by-query 3) Optimize! #2 brings up a basic design question: you have to decide which documents go to which shards. Mostly people use a value generated by a hash on the actual id- this allows you to assign docs evenly. http://wiki.apache.org/solr/UniqueKey On Fri, May 4, 2012 at 4:28 PM, Young, Cody cody.yo...@move.com wrote: You can also make a copy of your existing index, bring it up as a second instance/core and then send delete queries to both indexes. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, May 04, 2012 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Single Index to Shards There's no way to split an _existing_ index into multiple shards, although some of the work on SolrCloud is considering being able to do this. You have a couple of choices here: 1 Just reindex everything from scratch into two shards 2 delete all the docs from your index that will go into shard 2 and 2 just index the docs for shard 2 in your new shard But I want to be sure you're on the right track here. You only need to shard if your index contains too many documents for your hardware to produce decent query rates. If you are getting (and I'm picking this number out of thin air) 50 QPS on your hardware (i.e. you're not stressing memory etc) and just want to get to 150 QPS, use replication rather than sharding. see: http://wiki.apache.org/solr/SolrReplication Best Erick On Fri, May 4, 2012 at 9:44 AM, michaelsever sever_mich...@bah.com wrote: If I have a single Solr index running on a Core, can I split it or migrate it into 2 shards? -- View this message in context: http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht ml Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Minor type in example solrconfig: process of provided docuemnts
I noticed this minor typo in the example solrconfig.xml for both 3.6 and trunk (as of 5/1): An analysis handler that provides a breakdown of the analysis process of provided docuemnts. This handler expects a (single) “docuemnts” should be “documents”. -- Jack Krupansky
Re: SOLRJ: Is there a way to obtain a quick count of total results for a query
don't score by relevance and score by document id may speed it up a little? I haven't done any test of this. may be u can give it a try. because scoring will consume some cpu time. you just want to match and get total count On Wed, May 2, 2012 at 11:58 PM, vybe3142 vybe3...@gmail.com wrote: I can achieve this by building a query with start and rows = 0, and using queryResponse.getResults().getNumFound(). Are there any more efficient approaches to this? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: elevate vs. select numFound results
I modified mysolrconfig.xml to: requestHandler name=/elevate class=solr.SearchHandler startup=lazy lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str bool name=omitHeadertrue/bool float name=tie0.01/float str name=pfcontent^2.0/str int name=ps15/int !-- Abort any searches longer than 4 seconds -- !-- int name=timeAllowed4000/int -- str name=mm1/str str name=q.alt*:*/str /lst arr name=last-components strelevator/str /arr /requestHandler Then added enableElevation=true parameter to my elevate url. http://mydomain:8181/solr/elevate?q=dwayne+rock+johnsonwt=xmlsort=score+descfl=id,bundle_nameexclusive=truedebugQuery=onenableElevation=true This made my /elevate parsed query to match my /select query, and I got back same numFound. My parsedquery: str name=parsedquery +((DisjunctionMaxQuery((content:dwayn)~0.01) DisjunctionMaxQuery((content:rock)~0.01) DisjunctionMaxQuery((content:johnson)~0.01))~1) DisjunctionMaxQuery((content:dwayn rock johnson~15^2.0)~0.01) /str But it would be nice to make exclusive=true work, and get empty result set back when there is no matching elevation query. Is there any solrconfig settings to do so? -Original Message- From: Noordeen, Roxy [mailto:roxy.noord...@wwecorp.com] Sent: Friday, May 04, 2012 8:11 PM To: solr-user@lucene.apache.org Subject: RE: elevate vs. select numFound results My actual problem is with elevate not working with exclusive=true. I have a special pinned widget, that has to display only the nodes defined in my elevate.xml, kind of sponsored results. If I define game in my elevte.xml, and send exclusive=true I get only the elevated entries. http://my domain:8181/solr/elevate?q=gamewt=xmlsort=score+descfl=id,bundle_nameexclusive=true but when I pass a word not defined in my elevate.xml, and send exclusive=true, I almost get same results like /select query. http://my domain:8181/solr/elevate?q=gamenotdefinedwt=xmlsort=score+descfl=id,bundle_nameexclusive=true So I ended up in using both elevate and select, if numbers [numFound] MATCH in both the request, I assume the word does not exist in my elevate.xml, and I had to hide my pinned widget. But in few cases, my /elevate and /select are not returning same numFound. There are some differences in the numbers. Is there a way to force exclusive=true just to look at elevate.xml entries, and ignore the result from default search? answer to your questions: 1. There is no exclude=true parameter set in my elevate.xml 2. There is no exlusive=true set in url 3. My elevate entry in solrconfig.xml searchComponent name=elevator class=solr.QueryElevationComponent !-- pick a fieldType to analyze queries -- str name=queryFieldTypestring/str str name=config-fileelevate.xml/str !-- str name=refreshOnCommmittrue/str -- /searchComponent !-- a request handler utilizing the elevator component -- requestHandler name=/elevate class=solr.SearchHandler startup=lazy lst name=defaults str name=echoParamsexplicit/str /lst arr name=last-components strelevator/str /arr /requestHandler 4. I am not sure how to verify qf difference. I am using raw schema.xml and solrconfig.xml shipped with drupal solr module. I manage most of the solr configs via the drupal module, except at query time I query solr queries directly. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, May 04, 2012 5:44 PM To: solr-user@lucene.apache.org Subject: Re: elevate vs. select numFound results Some ways that fewer docs might be returned by query elevation: 1. The excude option: exclude=true in the xml file. 2. The exclusive request parameter: exclusive=true in the URL. (Certainly not your case.) 3. The exclusive request parameter default set to true in defaults for the /elevate request handler in solrconfig. 4. Some other query-related parameters (e.g., qf) are different between your /select and /elevate request handlers Try adding enableElevation=false to your URL for /elevate, which should show you whether query elevation itself is affecting the number of docs, or if it must be some other parameters that are different between the two request handlers. -- Jack Krupansky -Original Message- From: roxy.noord...@wwecorp.com Sent: Friday, May 04, 2012 3:21 PM To: solr-user@lucene.apache.org Subject: elevate vs. select numFound results I need help understanding the difference in the numFound number in the result when I execute two queries against my solr instance, one with the elevation and one without. I have a simple elevate.xml file created and working and am searching for terms that are not meant to be elevated. Elevate query example.com:8080/solr/elevate?q=dwayne+rock+johnsonwt=xmlsort=score+descrows=1 for this the numFound is 125 in the result element of the XML Select query example.com:8080/solr/select?q=dwayne+rock+johnsonwt=xmlsort=score+descrows=1 for this the numFound is 154 in the result element of the XML For many
Re: how to present html content in browse
You need positions and offsets to do highlighting. A CharFilter does not preserve positions. I think you have to analyze the raw HTML with a different Analyzer, as well as the stripper. I think this is how it works: use a new Analyzer stack that uses the StandardAnalyzer, and the lower case filter and stemmer/synonym etc. Now, store the HTML field with that text type. You then search on the stripped field, but highlight from the raw field with 'hl.fl'. Here's the cool part: you do not actually need to index the raw HTML, only store it. If you do not index a field, the Highlighter analyzes the HTML when it needs the positions and offsets. On Fri, May 4, 2012 at 2:25 PM, okayndc bodymo...@gmail.com wrote: Okay, thanks for the info. On Fri, May 4, 2012 at 4:42 PM, Jack Krupansky j...@basetechnology.comwrote: Evidently there was a problem with highlighting of HTML that is supposedly fixed in Solr 3.6 and trunk: https://issues.apache.org/**jira/browse/SOLR-42https://issues.apache.org/jira/browse/SOLR-42 -- Jack Krupansky -Original Message- From: okayndc Sent: Friday, May 04, 2012 4:35 PM To: solr-user@lucene.apache.org Subject: Re: how to present html content in browse Is it possible to return the HTML field highlighted? On Fri, May 4, 2012 at 1:27 PM, Jack Krupansky j...@basetechnology.com** wrote: 1. The raw html field (call it, text_html) would be a string type field that is stored but not indexed. This is the field you direct DIH to output to. This is the field you would return in your search results with the HTML to be displayed. 2. The stripped field (call it, text_stripped) would be a text type field (where text is a field type you add that uses the HTML strip char filter as shown below) that is not stored but is indexed. Add a CopyField to your schema that copies from the raw html field to the stripped field (say, text_html to text_stripped.) For reference on HTML strip (HTMLStripCharFilterFactory), see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltershttp://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s http://wiki.apache.org/**solr/**AnalyzersTokenizersTokenFilter**shttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Which has: fieldtype name=text class=solr.TextField analyzer charFilter class=solr.HTMLStripCharFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-** ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldtype Although, you might want to call that field type text_stripped to avoid confusion with a simple text field You can add HTMLStripCharFilterFactory to some other field type that you might want to use, but this charFilter needs to be before the tokenizer. The text field type above is just an example. -- Jack Krupansky -Original Message- From: okayndc Sent: Friday, May 04, 2012 1:01 PM To: solr-user@lucene.apache.org Subject: Re: how to present html content in browse Hello, I'm having a hard time understanding this, and I had this same question. When using DIH should the HTML field be stored in the raw HTML string field or the stripped field? Also what source field(s) need to be copied and to what destination? Thanks On Thu, May 3, 2012 at 10:15 PM, Lance Norskog goks...@gmail.com wrote: Make two fields, one with stores the stripped HTML and another that stores the parsed HTML. You can use copyField so that you do not have to submit the html page twice. You would mark the stripped field 'indexed=true stored=false' and the full text field the other way around. The full text field should be a String type. On Thu, May 3, 2012 at 1:04 PM, srini softtec...@gmail.com wrote: I am indexing records from database using DIH. The content of my record is in html format. When I use browse I would like to show the content in html format, not in text format. Any ideas? -- View this message in context: http://lucene.472066.n3.**nabb**le.com/how-to-present-**http://nabble.com/how-to-present-** html-content-in-browse-tp3960327.htmlhttp://lucene.** 472066.n3.nabble.com/how-to-**present-html-content-in-** browse-tp3960327.htmlhttp://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Solr Merge during off peak times
Optimize takes a 'maxSegments' option. This tells it to stop when there are N segments instead of just one. If you use a very high mergeFactor and then call optimize with a sane number like 50, it only merges the little teeny segments. On Thu, May 3, 2012 at 8:28 PM, Shawn Heisey s...@elyograg.org wrote: On 5/2/2012 5:54 AM, Prakashganesh, Prabhu wrote: We have a fairly large scale system - about 200 million docs and fairly high indexing activity - about 300k docs per day with peak ingestion rates of about 20 docs per sec. I want to work out what a good mergeFactor setting would be by testing with different mergeFactor settings. I think the default of 10 might be high, I want to try with 5 and compare. Unless I know when a merge starts and finishes, it would be quite difficult to work out the impact of changing mergeFactor. I want to be able to measure how long merges take, run queries during the merge activity and see what the response times are etc.. With a lot of indexing activity, if you are attempting to avoid large merges, I would think you would want a higher mergeFactor, not a lower one, and do occasional optimizes during non-peak hours. With a small mergeFactor, you will be merging a lot more often, and you are more likely to encounter merges of already-merged segments, which can be very slow. My index is nearing 70 million documents. I've got seven shards - six large indexes with about 11.5 million docs each, and a small index that I try to keep below half a million documents. The small index contains the newest documents, between 3.5 and 7 days worth. With this setup and the way I manage it, large merges pretty much never happen. Once a minute, I do an update cycle. This looks for and applies deletions, reinserts, and new document inserts. New document inserts happen only on the small index, and there are usually a few dozen documents to insert on each update cycle. Deletions and reinserts can happen on any of the seven shards, but there are not usually deletions and reinserts on every update cycle, and the number of reinserts is usually very very small. Once an hour, I optimize the small index, which takes about 30 seconds. Once a day, I optimize one of the large indexes during non-peak hours, so every large index gets optimized once every six days. This takes about 15 minutes, during which deletes and reinserts are not applied, but new document inserts continue to happen. My mergeFactor is set to 35. I wanted a large value here, and this particular number has a side effect -- uniformity in segment filenames on the disk during full rebuilds. Lucene uses a base-36 segment numbering scheme. I usually end up with less than 10 segments in the larger indexes, which means they don't do merges. The small index does do merges, but I have never had a problem with those merges going slowly. Because I do occasionally optimize, I am fairly sure that even when I do have merges, they happen with 35 very small segment files, and leave the large initial segment alone. I have not tested this theory, but it seems the most sensible way to do things, and I've found that Lucene/Solr usually does things in a sensible manner. If I am wrong here (using 3.5 and its improved merging), I would appreciate knowing. Thanks, Shawn -- Lance Norskog goks...@gmail.com
Re: Searching by location – What do I send to Solr?
You could just download postalcodes every day. To be nice, you could pull the HEAD of each file and check if it is new. This is just a set of tables, which you denormalize and add to your other fields. There are other sources of polygonal shape data, but there is no official Solr toolkit for querying inside the irregular polygon. On Thu, May 3, 2012 at 6:19 PM, Erick Erickson erickerick...@gmail.com wrote: The fact that they're python and java is largely beside the point I think. Solr just sees a URL, the fact that your Python app gets in there first and does stuff with the query wouldn't affect Solr at all. Also, I tend to like keeping Solr fairly lean so any work I can offload to the application I usually do. YMMV Best Erick On Thu, May 3, 2012 at 6:43 PM, Spadez james_will...@hotmail.com wrote: I discounted geonames to start with but it actually looks pretty good. I may be stretching the limit of my question here, but say I did go with geonames, if I go back to my model and add a bit: Search for London-Convert London to Long/Lat-Send Query to Solr-Return Query Since my main website is coded in Python, but Solr works in Java, if I was to create or use an existing script to allow me to convert London to Long/Lat, would it make more sense for this operation to be done in Python or Java? In Python it would integrate better with my website, but in Java it would integrate better with Solr. Also would one language be more suitable or faster for this kind of operation? Again, I might be pushing the boundaries of what I can ask on here, but if anyone can chime in with their opinion I would really appreciate it. ~ James -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960666.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Phrase Slop probelm
Maybe it could throw an exception because the user is clearly trying to do something impossible. On Wed, May 2, 2012 at 3:19 PM, Jack Krupansky j...@basetechnology.com wrote: You are missing the pf, pf2, and pf3 request parameters, which says which fields to do phrase proximity boosting on. pf boosts using the whole query as a phrase, pf2 boosts bigrams, and pf3 boost trigrams. You can use any combination of them, but if you use none of them, ps appears to be ignored. Maybe it should default to doing some boost if none of the field lists is given, like boost using bigrams in the qf fields, but it doesn't. -- Jack Krupansky -Original Message- From: André Maldonado Sent: Wednesday, May 02, 2012 3:29 PM To: solr-user@lucene.apache.org Subject: Phrase Slop probelm Hi all. In my index I have a multivalued field that contains a lot of information, all text searches are based on it. So, When I Do: http://xxx.xx.xxx.xxx:/Index/select/?start=0rows=12q=term1+term2+term3qf=textoboostfq=field1%3aanother_termdefType=edismaxmm=100%25http://10.100.3.62:8984/solr/Index/select/?start=0rows=12q=churrasqueira+varanda+sacadaps=0qf=textoboost%20textofq=localexibicao%3azapdefType=edismaxmm=100%25debugQuery=trueechoParams=all I got the same result as in: http://xxx.xx.xxx.xxx:/Index/select/?start=0rows=12q=term1+term2+term3 *ps=0*qf=textoboostfq=field1%3aanother_termdefType=edismaxmm=100%25http://10.100.3.62:8984/solr/Index/select/?start=0rows=12q=churrasqueira+varanda+sacadaps=0qf=textoboost%20textofq=localexibicao%3azapdefType=edismaxmm=100%25debugQuery=trueechoParams=all And the same result in: http://xxx.xx.xxx.xxx:/Index/select/?start=0rows=12q=term1+term2+term3 *ps=10* qf=textoboostfq=field1%3aanother_termdefType=edismaxmm=100%25http://10.100.3.62:8984/solr/Index/select/?start=0rows=12q=churrasqueira+varanda+sacadaps=0qf=textoboost%20textofq=localexibicao%3azapdefType=edismaxmm=100%25debugQuery=trueechoParams=all What I'm doing wrong? Thank's * -- * *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* *andre.maldonado*@gmail.com andre.maldon...@gmail.com (11) 9112-4227 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.facebook.com/profile.php?id=10659376883 http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado https://profiles.google.com/105605760943701739931 http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3 http://www.youtube.com/andremaldonado -- Lance Norskog goks...@gmail.com
Re: correct XPATH syntax
The XPath implementation in DIH is very minimal- it is tuned for speed, not features. The XSL option lets you do everything you could want, with a slower engine. On Thu, May 3, 2012 at 7:30 AM, lboutros boutr...@gmail.com wrote: ok, not that easy :) I did not test it myself but it seems that you could use an XSL preprocessing with the 'xsl' option in your XPathEntityProcessor : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 You could transform the author part as you wish and then import the author field with your actual configuration. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: solr snapshots - old school and replication - new school ?
Yes. Replication is a lot easier to use and does a lot more. On Thu, May 3, 2012 at 6:00 AM, geeky2 gee...@hotmail.com wrote: hello all, enviornment: centOS and solr 3.5 i want to make sure i understand the difference between snapshots and solr replication. snapshots are old school and have been deprecated with solr replication new school. do i have this correct? btw: i have replication working (now), between my master and two slaves - i just want to make sure i am not missing a larger picture ;) i have been reading the Smiley Pugh book (pg 349) as well as material on the wiki at: http://wiki.apache.org/solr/SolrCollectionDistributionScripts http://wiki.apache.org/solr/SolrReplication thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/solr-snapshots-old-school-and-replication-new-school-tp3959152.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com