Deploy Solritas as a separate application?
Solritas is a nice search UI integrated with Solr with many features we could use. However we do not want to build our UI into our Solr instance. We will have a front-end web app interfacing with Solr. Is there an easy way to deploy Solritas as a separate application (e.g., Solritas with SolrJ to query a backend Solr instance)? -- View this message in context: http://lucene.472066.n3.nabble.com/Deploy-Solritas-as-a-separate-application-tp3392326p3392326.html Sent from the Solr - User mailing list archive at Nabble.com.
is there any attribute in schema.xml to avoid duplication in solr?
Hi everybody i want to know whether is there any attribute in schema.xml to avoid the duplications? pls reply thanks -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-attribute-in-schema-xml-to-avoid-duplication-in-solr-tp3392408p3392408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UniqueKey filed length exceeds
Thanks for u r reply Erick, (Here my use case is -MM-DD 13:54:11.414632 needs to be unique key) when i trying to search the data for http://localhost:8080/solr/select/?q=2009-11-04:13:51:07.348184 it throws following error, though i change my schema to textfield i am getting following error fieldType name=string class=solr.TextField sortMissingLast=true omitNorms=true / kindly check my stack trace SEVERE: org.apache.solr.common.SolrException: undefined field 2009-11-04 at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1254) at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getAnalyzer(IndexSchema.java:410) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:574) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:158) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) -- View this message in context: http://lucene.472066.n3.nabble.com/UniqueKey-filed-length-exceeds-tp3389759p3392432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents Indexed, SolrJ see nothing before long time
Thank you Christopher, I have found one issue in my code when building a query, thus I do not know why it is not working. When I comment this line, I get right result count : // solrQuery.setParam(fq, +creation_date:[* TO NOW] +type:QUESTION); Where creation_date is one Date field and type one String field. I have tried those 2 lines which also make the query retrieve wrong count whereas with curl working...: solrQuery.addFilterQuery(creation_date:[* TO NOW]); solrQuery.addFilterQuery(type:QUESTION); When I remove filter on date, it works : solrQuery.addFilterQuery(type:QUESTION); any problem with adding 2 filters and one with a date ? Is there any problem with this syntax for filter query and SolrJ ? -- View this message in context: http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3392473.html Sent from the Solr - User mailing list archive at Nabble.com.
how to avoid duplicates in search results?
Hi everybody i got the following response code ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int - lst name=params str name=dfgroups/str str name=indenton/str str name=start0/str str name=qparticipate/str str name=version2.2/str str name=rows30/str /lst /lst - result name=response numFound=2 start=0 - doc str name=descriptiontesting group/str str name=nametesting group/str str name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str /doc - doc str name=descriptiontesting group/str str name=nametesting group/str str name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str /doc /result /response /code i need to remove the duplicte results can anyone give me suggestions -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-avoid-duplicates-in-search-results-tp3392524p3392524.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documents Indexed, SolrJ see nothing before long time
Well, I guess, it is stupid to make +creation_date:[* TO NOW] filter -- View this message in context: http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3392538.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to avoid duplicates in search results?
You can probably use the Grouping feature: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters There is also a Document Duplicate Detection at index time: http://wiki.apache.org/solr/Deduplication On Tue, Oct 4, 2011 at 9:55 AM, nagarjuna nagarjuna.avul...@gmail.comwrote: Hi everybody i got the following response code ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int - lst name=params str name=dfgroups/str str name=indenton/str str name=start0/str str name=qparticipate/str str name=version2.2/str str name=rows30/str /lst /lst - result name=response numFound=2 start=0 - doc str name=descriptiontesting group/str str name=nametesting group/str str name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str /doc - doc str name=descriptiontesting group/str str name=nametesting group/str str name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str /doc /result /response /code i need to remove the duplicte results can anyone give me suggestions -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-avoid-duplicates-in-search-results-tp3392524p3392524.html Sent from the Solr - User mailing list archive at Nabble.com. -- Edoardo Tosca Sourcesense - making sense of Open Source: http://www.sourcesense.com
Re: SolrJ Annotation for multiValued field
well, another mistake, it works...sorry ;) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Annotation-for-multiValued-field-tp3390255p3392652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: email - DIH
Nobody to help? I tried telnet to get informations about the emails. Via telnet with IMAP i can get any required fields. Is this an implementation issue? -- View this message in context: http://lucene.472066.n3.nabble.com/email-DIH-tp2711416p3392846.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query failing because of omitTermFreqAndPositions
This is because, within one segment only 1 value (omitP or not) is possible, for all the docs in that segment. This then means, on merging segments with different values for omitP, Lucene must reconcile the different values, and that reconciliation will favor omitting positions (if it went the other way, Lucene would have to make up fake positions, which seems very dangerous). Even if you delete all documents containing that field, and optimize down to one segment, this omitPositions bit will still stick, because of how Lucene stores the metadata per field. omitNorms also behaves this way: once omitted, always omitted. Mike McCandless http://blog.mikemccandless.com On Tue, Oct 4, 2011 at 1:41 AM, Isan Fulia isan.fu...@germinait.com wrote: Hi Mike, Thanks for the information.But why is it that once omiited positions in the past , it will always omit positions even if omitPositions is made false. Thanks, Isan Fulia. On 29 September 2011 17:49, Michael McCandless luc...@mikemccandless.comwrote: Once a given field has omitted positions in the past, even for just one document, it sticks and that field will forever omit positions. Try creating a new index, never omitting positions from that field? Mike McCandless http://blog.mikemccandless.com On Thu, Sep 29, 2011 at 1:14 AM, Isan Fulia isan.fu...@germinait.com wrote: Hi All, My schema consisted of field textForQuery which was defined as field name=textForQuery type=text indexed=true stored=false multiValued=true/ After indexing 10 lakhs of documents I changed the field to field name=textForQuery type=text indexed=true stored=false multiValued=true *omitTermFreqAndPositions=true*/ So documents that were indexed after that omiited the position information of the terms. As a result I was not able to search the text which rely on position information for eg. coke studio at mtv even though its present in some documents. So I again changed the field textForQuery to field name=textForQuery type=text indexed=true stored=false multiValued=true/ But now even for new documents added the query requiring positon information is still failing. For example i reindexed certain documents that consisted of coke studio at mtv but still the query is not returning any documents when searched for *textForQuery:coke studio at mtv* Can anyone please help me out why this is happening -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: UniqueKey filed length exceeds
It looks like your query is getting parsed as a field and a value field: 2009-11-04 value: 13:51:07.34814 if you'd like to make a query like this you need to escape the : so something like 2009-11-04\:13\:51\:07.348184 See the following link for more information http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Characters On Tue, Oct 4, 2011 at 3:59 AM, kiran.bodigam kiran.bodi...@gmail.com wrote: Thanks for u r reply Erick, (Here my use case is -MM-DD 13:54:11.414632 needs to be unique key) when i trying to search the data for http://localhost:8080/solr/select/?q=2009-11-04:13:51:07.348184 it throws following error, though i change my schema to textfield i am getting following error fieldType name=string class=solr.TextField sortMissingLast=true omitNorms=true / kindly check my stack trace SEVERE: org.apache.solr.common.SolrException: undefined field 2009-11-04 at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1254) at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getAnalyzer(IndexSchema.java:410) at org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385) at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:574) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:158) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) -- View this message in context: http://lucene.472066.n3.nabble.com/UniqueKey-filed-length-exceeds-tp3389759p3392432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there any attribute in schema.xml to avoid duplication in solr?
UniqueId avoids entries with the same id. -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-any-attribute-in-schema-xml-to-avoid-duplication-in-solr-tp3392408p3393085.html Sent from the Solr - User mailing list archive at Nabble.com.
How to achieve Indexing @ 270GiB/hr
Greetings, While going through the article 265% indexing speedup with Lucene's concurrent flushinghttp://java.dzone.com/news/265-indexing-speedup-lucenes?mz=33057-solr_lucene I was stunned by the endless possibilities in which Indexing speed could be increased. I'd like to take inputs from everyone over here as to how to achieve this speed. As far as I understand there are two broad ways of feeding data to Solr - 1. Using DataImportHandler 2. Using HTTP to POST docs to Solr. The speeds at which the article describes indexing seems kinda too much to expect using the second approach. Or is it possible using multiple instances feeding docs to Solr? My current setup does the following - 1. Execute SQL queries to create database of documents that needs to be fed. 2. Go through the columns one by one, and create XMLs for them and send it over to Solr in batches of max 500 docs. Even if using DataImportHandler what are the ways this could be optimized? If I am able to solve the problem of indexing data in our current setup, my life would become a lot easier. *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny
RE: solr searching for special characters?
how it is possible also explain me and which tokenizer class can support for finding the special characters . Probably WhiteSpaceTokenizer will do the job for you. Plus you need to escape special characters (if you are using defType=lucene query parser). Anyhow you need to provide us more details. http://wiki.apache.org/solr/UsingMailingLists
Re: Deploy Solritas as a separate application?
On Oct 3, 2011, at 23:32 , jwang wrote: Solritas is a nice search UI integrated with Solr with many features we could use. However we do not want to build our UI into our Solr instance. We will have a front-end web app interfacing with Solr. Is there an easy way to deploy Solritas as a separate application (e.g., Solritas with SolrJ to query a backend Solr instance)? Well thanks! It currently can't be deployed into a front-end webapp as is. [though one could set it up as a non-doc-containing front-end to another shard, perhaps] The Velocity stuff is quite straightforward though, so it'd be easy to create a servlet that passed requests to Solr and fed the response to Velocity templates. I've thought about implementing this sort of thing a million times over. I've come up with what I think is a very lean/clean way to do this, still using Velocity templates, but using JRuby (via Sinatra) as the front-end web tier. I've implemented, a while ago, a proof-of-concept of this here: https://github.com/lucidimagination/Prism I'd like to make time to flesh this out even further. The templates could be made practically the same from inside Solr's wt=velocity infrastructure to an external system using Velocity to render Solr responses. In general - what you're asking for specifically doesn't exist that I know of, but the pieces are all kind of there to put this together. And Prism is my opinionated way that Solr-backed UI can be built leanly and cleanly. Erik
Re: SOLR HttpCache Qtime
We are using this Qtime field and publishing in our front web. Even the httpCache decreasing the Qtime in reality, its still using the cached old Qtime value . We can use our internal qtime instead of Solr's but I just wonder is there any way to say Solr if its coming httpCache re-calculate the Qtime. On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.comwrote: Why do you want to? QTime is the time Solr spends searching. The cached value will, indeed, be from the query that filled in the HTTP cache. But what are you doing with that information that you want to correct it? That said, I have no clue how you'd attempt to do this. Best Erick On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, Is there anyway to get correct Qtime when we use http caching ? I think Solr caching also the Qtime so giving the the same Qtime in response what ever takes it to finish .. How I can set Qtime correcly from solr when I use http caching On. thanks
Re: Shingle and Query Performance
We figured out that if use only shingle field not combined with ouput Unigram than performance getting better. I f we use output unigram its not good from the normal index field. so we decide to make separate field only combined shingle using this field to support main queries. On Wed, Aug 31, 2011 at 1:01 PM, Lord Khan Han khanuniver...@gmail.comwrote: Thanks Erick.. If I figure out something I will let you know also.. No body replied except you I thought there might be more people involve here.. Thanks On Wed, Aug 31, 2011 at 3:47 AM, Erick Erickson erickerick...@gmail.comwrote: OK, I'll have to defer because this makes no sense. 4+ seconds in the debug component? Sorry I can't be more help here, but nothing really jumps out. Erick On Tue, Aug 30, 2011 at 12:45 PM, Lord Khan Han khanuniver...@gmail.com wrote: Below the output of the debug. I am measuring pure solr qtime which show in the Qtime field in solr xml. arr name=parsed_filter_queries strmrank:[0 TO 100]/str /arr lst name=timing double name=time8584.0/double lst name=prepare double name=time12.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time12.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.SpellCheckComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time8572.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time4480.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time41.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.SpellCheckComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time4051.0/double /lst On Tue, Aug 30, 2011 at 5:38 PM, Erick Erickson erickerick...@gmail.comwrote: Can we see the output if you specify both debugQuery=ondebug=true the debug=true will show the time taken up with various components, which is sometimes surprising... Second, we never asked the most basic question, what are you measuring? Is this the QTime of the returned response? (which is the time actually spent searching) or the time until the response gets back to the client, which may involve lots besides searching... Best Erick On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han khanuniver...@gmail.com wrote: Hi Eric, Fields are lazy loading, content stored in solr and machine 32 gig.. solr has 20 gig heap. There is no swapping. As you see we have many phrases in the same query . I couldnt find a way to drop qtime to subsecends. Suprisingly non shingled test better qtime ! On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson erickerick...@gmail.com wrote: Oh, one other thing: have you profiled your machine to see if you're swapping? How much memory are you giving your JVM? What is the underlying hardware setup? Best Erick On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson erickerick...@gmail.com wrote: 200K docs and 36G index? It sounds like you're storing your documents in the Solr index. In and of itself, that shouldn't hurt your query times, *unless* you have lazy field loading turned off, have you checked that lazy field loading is enabled? Best Erick On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han khanuniver...@gmail.com wrote: Another insteresting thing is : all one word or more word queries including phrase queries such as barack obama slower in shingle configuration. What i am doing wrong ? without shingle barack obama Querytime 300ms with shingle 780 ms.. On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, What is the difference between solr 3.3 and the trunk ? I will try 3.3 and let you know the results. Here the search handler: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !--str
Analyzer Tokenizer for Exact and Contains search on single field
I am a Solr newbie. Let's say we have a field with 4 records as follows: James James Edward James Edward Gray JamesEdward a. In Solr 3.4, I want an exact search on the given field for James Edward. Record 2 should be returned. b. Next on the same field, I want to check whether James is contained in the field, then records 1, 2 and 3 should be returned. Which standard analyzer, tokenizer can one apply on one single field, to get these results? Satish
Re: Determining master/slave from ZK in SolrCloud
Ok, so I am pretty sure this information is not available. What is the most appropriate way to add information like this to ZK? I can obviously look for the system properties enable.master and enable.slave, but that won't be fool proof since someone could put this in the config file instead and not as a system property. Is there a way to determine this quickly programatically without having to go through all of the request handlers in the solrCore? On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible to determine if a solr instance is a master or a slave in replication terms based on the information that is placed in ZK in SolrCloud?
Re: SOLR HttpCache Qtime
But if the HTTP cache is what's returning the value, Solr never sees anything at all, right? So Solr doesn't have a chance to do anything here. Best Erick On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Han khanuniver...@gmail.com wrote: We are using this Qtime field and publishing in our front web. Even the httpCache decreasing the Qtime in reality, its still using the cached old Qtime value . We can use our internal qtime instead of Solr's but I just wonder is there any way to say Solr if its coming httpCache re-calculate the Qtime. On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.comwrote: Why do you want to? QTime is the time Solr spends searching. The cached value will, indeed, be from the query that filled in the HTTP cache. But what are you doing with that information that you want to correct it? That said, I have no clue how you'd attempt to do this. Best Erick On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, Is there anyway to get correct Qtime when we use http caching ? I think Solr caching also the Qtime so giving the the same Qtime in response what ever takes it to finish .. How I can set Qtime correcly from solr when I use http caching On. thanks
Indexing PDF
Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with some files I’ve got problems because they stored estrange characters. I got stored this content: +++ Starting a Search Application Abstract Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page ii Do You Need Full-text Search? ∞ ∞ ∞ Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1
Re: Indexing PDF
full of boxes for me. Héctor, you need another way to reference these! (e.g. a URL) paul Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with some files I’ve got problems because they stored estrange characters. I got stored this content: +++ Starting a Search Application Abstract Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page ii Do You Need Full-text Search? ∞ ∞ ∞ Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1
Re: Determining master/slave from ZK in SolrCloud
I'm putting this out there for comment. Right now I'm in ZKControllers and changed register as follows: public void register(SolrCore core, boolean forcePropsUpdate) throws IOException, and at line 479 I've added this SolrRequestHandler requestHandler = core.getRequestHandler(/replication); boolean master = false; if(requestHandler != null requestHandler instanceof ReplicationHandler){ ReplicationHandler replicationHandler = (ReplicationHandler)requestHandler; master = replicationHandler.isMaster(); } props.put(replication.master, master); I also modified CoreContainer and ReplicationHandler to support what I have above. Does this seem a reasonable way to approach this? Also to provide some history, I need this so I can programatically determine which servers are masters and slaves and subsequently which should be written to (for updates) and which should be queried. On Tue, Oct 4, 2011 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote: Ok, so I am pretty sure this information is not available. What is the most appropriate way to add information like this to ZK? I can obviously look for the system properties enable.master and enable.slave, but that won't be fool proof since someone could put this in the config file instead and not as a system property. Is there a way to determine this quickly programatically without having to go through all of the request handlers in the solrCore? On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible to determine if a solr instance is a master or a slave in replication terms based on the information that is placed in ZK in SolrCloud?
Re: Determining master/slave from ZK in SolrCloud
Because the distributed indexing phase of SolrCloud will not use replication, we have not really gone down this path at all. One thing we are considering is adding the ability to add various roles to each shard as hints - eg a shard might be designated a searcher and another an indexer. You might be able to piggy back on this to label things master/slave. A ZooKeeper aware replication handler could then use this information. There is nothing to stop you from adding this information to zookeeper yourself, using standard zookeeper tools - but just putting the information is only half the problem - something then needs to read it. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote: Ok, so I am pretty sure this information is not available. What is the most appropriate way to add information like this to ZK? I can obviously look for the system properties enable.master and enable.slave, but that won't be fool proof since someone could put this in the config file instead and not as a system property. Is there a way to determine this quickly programatically without having to go through all of the request handlers in the solrCore? On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible to determine if a solr instance is a master or a slave in replication terms based on the information that is placed in ZK in SolrCloud?
Re: Determining master/slave from ZK in SolrCloud
Thanks for the reply Mark. So a couple of questions. When is distributed indexing going to be available on Trunk? Are there docs on it now? I think having Roles on the shard would scratch the itch here, because as you said I could then include a role which indicated what to do with this server. My use case is actually for something outside of Solr. As of right now we are not on the latest trunk (actually a few months back I think) but could push to upgrade to it if the distributed indexing code was available today, but management may still shoot that down because of a short timeline. Suffice it to say that I'll be reading this information by another application to handle distributed indexing externally. The version that I'm working on requires that the application be responsible for performing the distribution. On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote: Because the distributed indexing phase of SolrCloud will not use replication, we have not really gone down this path at all. One thing we are considering is adding the ability to add various roles to each shard as hints - eg a shard might be designated a searcher and another an indexer. You might be able to piggy back on this to label things master/slave. A ZooKeeper aware replication handler could then use this information. There is nothing to stop you from adding this information to zookeeper yourself, using standard zookeeper tools - but just putting the information is only half the problem - something then needs to read it. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote: Ok, so I am pretty sure this information is not available. What is the most appropriate way to add information like this to ZK? I can obviously look for the system properties enable.master and enable.slave, but that won't be fool proof since someone could put this in the config file instead and not as a system property. Is there a way to determine this quickly programatically without having to go through all of the request handlers in the solrCore? On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible to determine if a solr instance is a master or a slave in replication terms based on the information that is placed in ZK in SolrCloud?
Re: Determining master/slave from ZK in SolrCloud
also as an FYI I created this JIRA https://issues.apache.org/jira/browse/SOLR-2811 which perhaps should be removed if the roles option comes to life. Is there a JIRA on that now? On Tue, Oct 4, 2011 at 12:12 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the reply Mark. So a couple of questions. When is distributed indexing going to be available on Trunk? Are there docs on it now? I think having Roles on the shard would scratch the itch here, because as you said I could then include a role which indicated what to do with this server. My use case is actually for something outside of Solr. As of right now we are not on the latest trunk (actually a few months back I think) but could push to upgrade to it if the distributed indexing code was available today, but management may still shoot that down because of a short timeline. Suffice it to say that I'll be reading this information by another application to handle distributed indexing externally. The version that I'm working on requires that the application be responsible for performing the distribution. On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote: Because the distributed indexing phase of SolrCloud will not use replication, we have not really gone down this path at all. One thing we are considering is adding the ability to add various roles to each shard as hints - eg a shard might be designated a searcher and another an indexer. You might be able to piggy back on this to label things master/slave. A ZooKeeper aware replication handler could then use this information. There is nothing to stop you from adding this information to zookeeper yourself, using standard zookeeper tools - but just putting the information is only half the problem - something then needs to read it. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote: Ok, so I am pretty sure this information is not available. What is the most appropriate way to add information like this to ZK? I can obviously look for the system properties enable.master and enable.slave, but that won't be fool proof since someone could put this in the config file instead and not as a system property. Is there a way to determine this quickly programatically without having to go through all of the request handlers in the solrCore? On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible to determine if a solr instance is a master or a slave in replication terms based on the information that is placed in ZK in SolrCloud?
Re: sorting using function query results are notin order
any help? -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3393781.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Analyzer Tokenizer for Exact and Contains search on single field
Hi Satish, I don't think there is a single analyzer that does what you want. However, you could send the info to a second field with copyField, and use e.g. WhitespaceTokenizer on one field for contains-style queries, and KeywordTokenizer on the other field (or just use the string field type) for exact matches. Steve -Original Message- From: Satish Talim [mailto:satish.ta...@gmail.com] Sent: Tuesday, October 04, 2011 10:02 AM To: solr-user@lucene.apache.org Subject: Analyzer Tokenizer for Exact and Contains search on single field I am a Solr newbie. Let's say we have a field with 4 records as follows: James James Edward James Edward Gray JamesEdward a. In Solr 3.4, I want an exact search on the given field for James Edward. Record 2 should be returned. b. Next on the same field, I want to check whether James is contained in the field, then records 1, 2 and 3 should be returned. Which standard analyzer, tokenizer can one apply on one single field, to get these results? Satish
Re: Determining master/slave from ZK in SolrCloud
So I see this JIRA which references roles https://issues.apache.org/jira/browse/SOLR-2765 I'm looking at implementing what Yonik suggested, namely in solrconfig.xml I have something like core name=${coreName} instanceDir=. shard=${shard} collection=${collection} roles=searcher,indexer/ these will be pulled out and added to the cloudDescriptor so they can be saved in ZK. Seem reasonable? On Tue, Oct 4, 2011 at 12:17 PM, Jamie Johnson jej2...@gmail.com wrote: also as an FYI I created this JIRA https://issues.apache.org/jira/browse/SOLR-2811 which perhaps should be removed if the roles option comes to life. Is there a JIRA on that now? On Tue, Oct 4, 2011 at 12:12 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the reply Mark. So a couple of questions. When is distributed indexing going to be available on Trunk? Are there docs on it now? I think having Roles on the shard would scratch the itch here, because as you said I could then include a role which indicated what to do with this server. My use case is actually for something outside of Solr. As of right now we are not on the latest trunk (actually a few months back I think) but could push to upgrade to it if the distributed indexing code was available today, but management may still shoot that down because of a short timeline. Suffice it to say that I'll be reading this information by another application to handle distributed indexing externally. The version that I'm working on requires that the application be responsible for performing the distribution. On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote: Because the distributed indexing phase of SolrCloud will not use replication, we have not really gone down this path at all. One thing we are considering is adding the ability to add various roles to each shard as hints - eg a shard might be designated a searcher and another an indexer. You might be able to piggy back on this to label things master/slave. A ZooKeeper aware replication handler could then use this information. There is nothing to stop you from adding this information to zookeeper yourself, using standard zookeeper tools - but just putting the information is only half the problem - something then needs to read it. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote: Ok, so I am pretty sure this information is not available. What is the most appropriate way to add information like this to ZK? I can obviously look for the system properties enable.master and enable.slave, but that won't be fool proof since someone could put this in the config file instead and not as a system property. Is there a way to determine this quickly programatically without having to go through all of the request handlers in the solrCore? On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible to determine if a solr instance is a master or a slave in replication terms based on the information that is placed in ZK in SolrCloud?
Suggestions feature
I am working on a feature similar to Youtube suggestions (where the videos are suggested based on your viewing history). What I do is parse the history and get the user's interests, in the form of weighted topics. When I boost according to those interests, the dominant ones take over the result list. Is there any way to use the boost so that there is some variety in the results? Thanks Milan
Re: sorting using function query results are notin order
Hmmm, try adding fl={!func}Count to make sure Count is an indexed field and function queries are getting the right values. -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference On Mon, Oct 3, 2011 at 3:42 PM, abhayd ajdabhol...@hotmail.com wrote: hi I am trying to sort results from solr using sum(count,score) function. Basically its not adding things correctly. For example here is partial sample response Count:54, UserQuery:how to, score:1.2550932, query({!dismax qf=UserQuery v='how'}):1.2550932, sum(Count,query({!dismax qf=UserQuery v='how'})):1.2550932}, how come addition of 54+1.2550932 is equla to 1.2550932 ?as if What i m doing wrong? here is my complete query http://localhost:10101/solr/autosuggest/select?q=howstart=0indent=onwt=jsonrows=5sort=sum%28Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29%29%20descfl=UserQuery,score,Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29,sum%28Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29%29debug=true { responseHeader:{ status:0, QTime:0, params:{ sort:sum(Count,query({!dismax qf=UserQuery v='how'})) desc, wt:json, rows:5, indent:on, fl:UserQuery,score,Count,query({!dismax qf=UserQuery v='how'}),sum(Count,query({!dismax qf=UserQuery v='how'})), debug:true, start:0, q:how}}, response:{numFound:2628,start:0,maxScore:1.2550932,docs:[ { Count:54, UserQuery:how to, score:1.2550932, query({!dismax qf=UserQuery v='how'}):1.2550932, sum(Count,query({!dismax qf=UserQuery v='how'})):1.2550932}, { Count:51, UserQuery:how to text, score:0.8964951, query({!dismax qf=UserQuery v='how'}):0.8964951, sum(Count,query({!dismax qf=UserQuery v='how'})):0.8964951}, { Count:117, UserQuery:how to block calls, score:0.7171961, query({!dismax qf=UserQuery v='how'}):0.7171961, sum(Count,query({!dismax qf=UserQuery v='how'})):0.7171961}, { Count:109, UserQuery:how to call forward, score:0.7171961, query({!dismax qf=UserQuery v='how'}):0.7171961, sum(Count,query({!dismax qf=UserQuery v='how'})):0.7171961}, { Count:79, UserQuery:how do I pay my bill?, score:0.7171961, query({!dismax qf=UserQuery v='how'}):0.7171961, sum(Count,query({!dismax qf=UserQuery v='how'})):0.7171961}] }, -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3390926.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Determining master/slave from ZK in SolrCloud
so my initial test worked, this appeared in ZK now roles=searcher,indexer which I can use to tell if it should be used to write to or not. It had fewer changes to other files as well I needed to change CloudDescriptor (add roles variable/methods) CoreContainer (parse roles attribute) ZKController (store roles in ZK) and now what is in ZK is the following: roles=searcher,indexer node_name=hostname:8983_solr url=http://hostname:8983/solr/ As I said this will meet my need, I can provide my changes back to the JIRA referenced if this is what is expected from roles. On Tue, Oct 4, 2011 at 12:45 PM, Jamie Johnson jej2...@gmail.com wrote: So I see this JIRA which references roles https://issues.apache.org/jira/browse/SOLR-2765 I'm looking at implementing what Yonik suggested, namely in solrconfig.xml I have something like core name=${coreName} instanceDir=. shard=${shard} collection=${collection} roles=searcher,indexer/ these will be pulled out and added to the cloudDescriptor so they can be saved in ZK. Seem reasonable? On Tue, Oct 4, 2011 at 12:17 PM, Jamie Johnson jej2...@gmail.com wrote: also as an FYI I created this JIRA https://issues.apache.org/jira/browse/SOLR-2811 which perhaps should be removed if the roles option comes to life. Is there a JIRA on that now? On Tue, Oct 4, 2011 at 12:12 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the reply Mark. So a couple of questions. When is distributed indexing going to be available on Trunk? Are there docs on it now? I think having Roles on the shard would scratch the itch here, because as you said I could then include a role which indicated what to do with this server. My use case is actually for something outside of Solr. As of right now we are not on the latest trunk (actually a few months back I think) but could push to upgrade to it if the distributed indexing code was available today, but management may still shoot that down because of a short timeline. Suffice it to say that I'll be reading this information by another application to handle distributed indexing externally. The version that I'm working on requires that the application be responsible for performing the distribution. On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote: Because the distributed indexing phase of SolrCloud will not use replication, we have not really gone down this path at all. One thing we are considering is adding the ability to add various roles to each shard as hints - eg a shard might be designated a searcher and another an indexer. You might be able to piggy back on this to label things master/slave. A ZooKeeper aware replication handler could then use this information. There is nothing to stop you from adding this information to zookeeper yourself, using standard zookeeper tools - but just putting the information is only half the problem - something then needs to read it. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote: Ok, so I am pretty sure this information is not available. What is the most appropriate way to add information like this to ZK? I can obviously look for the system properties enable.master and enable.slave, but that won't be fool proof since someone could put this in the config file instead and not as a system property. Is there a way to determine this quickly programatically without having to go through all of the request handlers in the solrCore? On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible to determine if a solr instance is a master or a slave in replication terms based on the information that is placed in ZK in SolrCloud?
Solr Schema and how?
Hello all, We have a screen builder application where users design their own forms. They have a choice of create forms fields with type date, text,numbers,large text etc upto total of 500 fields supported on a screen. Once screens are designed system automatically handle the type checking for valid data entries on front end even though data of any type gets stored as text. So as you can imagine, table is huge with 600+ columns(screenId,recordId,field1 ...field500) and every column is set as 'text'. Same table stores data for every screen designed in the system. So basically here are my questions 1. How best to index it? I did it using dynamic field 'field*' which works great 2. Since everything is text,not sure how to enable filtering on each field e.g. If a user wants to enable 'greater than' or 'less then' type of queries on a number field (stored as text), somehow that data needs to be stored as number in SOLR but I don't think I have a way to do that. I can't do that Since 'field2' may be be a 'number' field for a 'screen1' and 'date' for screen2. Would appreciate any ideas to handle this? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Schema-and-how-tp3393989p3393989.html Sent from the Solr - User mailing list archive at Nabble.com.
Search on content_type
Hi I'm using Nutch for crawing and indexed my data by using index-more plugin and added my required field (like content_type) to schema.xml in Solr. Now how can i search on pdf files (a kind of content_types) using this new index? what query should i enter to have a search on pdf files in Solr?
Re: Indexing PDF
I have this problem too, in indexing some of persian pdf files. 2011/10/4 Héctor Trujillo hecto...@gmail.com Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with some files I’ve got problems because they stored estrange characters. I got stored this content: +++ Starting a Search Application Abstract Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page ii Do You Need Full-text Search? ∞ ∞ ∞ Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1
Re: SOLR HttpCache Qtime
I just want to be sure.. because its solr internal HTTP cache.. not an outside httpcacher On Tue, Oct 4, 2011 at 5:39 PM, Erick Erickson erickerick...@gmail.comwrote: But if the HTTP cache is what's returning the value, Solr never sees anything at all, right? So Solr doesn't have a chance to do anything here. Best Erick On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Han khanuniver...@gmail.com wrote: We are using this Qtime field and publishing in our front web. Even the httpCache decreasing the Qtime in reality, its still using the cached old Qtime value . We can use our internal qtime instead of Solr's but I just wonder is there any way to say Solr if its coming httpCache re-calculate the Qtime. On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.com wrote: Why do you want to? QTime is the time Solr spends searching. The cached value will, indeed, be from the query that filled in the HTTP cache. But what are you doing with that information that you want to correct it? That said, I have no clue how you'd attempt to do this. Best Erick On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, Is there anyway to get correct Qtime when we use http caching ? I think Solr caching also the Qtime so giving the the same Qtime in response what ever takes it to finish .. How I can set Qtime correcly from solr when I use http caching On. thanks
Case Insensitive Sting
Hello, I know that this topic was already discussed, but I want to make sure I understood it right. I need to have a field for email of a user. I should be able to find a document(s) by this field, and it should be exact match, and case insensitive. Based on that I've found from previous discussions, I couldn't use solr.StrField class, but should use solr.TextField class instead. Also, I've suspect very match that later the requirements could change and I should be able to store not an email as identifier, but some free text, potentially with spaces, and some other white spaces, and still should be able to do exact case insensitive match. So I come up with such type: fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true analyzer filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType So, if I have a document with a field of such type, and it would contain value like this: ABC 123 xyz A xyz query shouldn't return the document, nor xyz 123 ABC query, but abc 123 XYZ should. Am I correct in my assumption, or am I missing something? Any comments are appreciated, Thank you, Eugene S.
Re: Indexing PDF
Your persian pdf problem is different, and already taken care of in pdfbox trunk https://issues.apache.org/jira/browse/PDFBOX-1127 On Tue, Oct 4, 2011 at 2:04 PM, ahmad ajiloo ahmad.aji...@gmail.com wrote: I have this problem too, in indexing some of persian pdf files. 2011/10/4 Héctor Trujillo hecto...@gmail.com Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with some files I’ve got problems because they stored estrange characters. I got stored this content: +++ Starting a Search Application Abstract Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page ii Do You Need Full-text Search? ∞ ∞ ∞ Starting a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1
http request works, but wget same URL fails
This http request works as desired (bringing back a csv file) http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=onversion=2.2q=battleshipwt=csv; but the same URL submitted via wget produces the 500 error reproduced below. I want the wget to download the csv file. What's going on? FredZ bitnami@ip-10-202-202-68:/opt/bitnami/apache2/htdocs/scripts$ --2011-10-04 19:33:41-- http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=on Resolving zimzazsearch3-1.bitnamiapp.com... 75.101.204.213 Connecting to zimzazsearch3-1.bitnamiapp.com|75.101.204.213|:8983... connected. HTTP request sent, awaiting response... 500 null java.lang.NullPointerException \tat java.io.StringReader.init(StringReader.java:33) \tat org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) \tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) \tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) \tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) \tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) \tat org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) \tat org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) \tat org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) \tat org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) \tat org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) \tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) \tat org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) \tat org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) \tat org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) \tat org.mortbay.jetty.Server.handle(Server.java:326) \tat org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) \tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) \tat org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) \tat o 2011-10-04 19:33:41 ERROR 500: null java.lang.NullPointerException \tat java.io.StringReader.init(StringReader.java:33) \tat org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) \tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) \tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) \tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) \tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) \tat org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) \tat org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) \tat org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) \tat org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) \tat org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) \tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) \tat org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) \tat org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) \tat org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) \tat org.mortbay.jetty.Server.handle(Server.java:326) \tat org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) \tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) \tat org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) \tat o. - Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for monthly updates
Re: Case Insensitive Sting
Hello, I know that this topic was already discussed, but I want to make sure I understood it right. I need to have a field for email of a user. I should be able to find a document(s) by this field, and it should be exact match, and case insensitive. Based on that I've found from previous discussions, I couldn't use solr.StrField class, but should use solr.TextField class instead. Also, I've suspect very match that later the requirements could change and I should be able to store not an email as identifier, but some free text, potentially with spaces, and some other white spaces, and still should be able to do exact case insensitive match. So I come up with such type: fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true analyzer filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType So, if I have a document with a field of such type, and it would contain value like this: ABC 123 xyz A xyz query shouldn't return the document, nor xyz 123 ABC query, but abc 123 XYZ should. Am I correct in my assumption, or am I missing something? Yeap all is correct. However, there is no tokenizer is defined in your analyzer. In your case KeywordTokenizerFactory should be used.
Private text fields
I'm trying to find a way to query a private field in solr while using the text fields. So I want to allow private tags only searchable by an assigned owner. The private tags will also query along side regular keyword tags. Here's an example: Company A (identified by idA) searches and finds company B (identified by idB). Company A would like to add some tags specific to Company B and only searchable from company A. So far I've found a few ways to implement this, and all of them feel really sloppy as far as what it'll do to the index by either making too many dynamic columns or manipulating the keywords in the value. I'd like to know if solr can do this out of the box in any other way. It looks like I can just create a custom field (PrivateString or something like that) but before I head down that route, I'd like to know what solr can do now. So here's a couple ways I figured out so far and I don't like either of them: Option 1: Use Dynamic Fields for the text field I can make the dynamic fields in solr that specify the field name followed by the company ID. For example, if Company A (idA) above adds the words great client to the field tags in the index, I can make the field as follows: tags_idA=great client But I'm going to be working with over 10K clients, and don't think this is a great idea. It would allow for simple syntax and a fast implementation, but I'd like to avoid crowding the index with 1000s of extra columns if I can avoid it. Option 2: Store the private tags with a company ID prepended to the keyword When searching the private tag field, I can store each keyword by prepending the identifier to that word. Although this would make the query easier, it'll hamper some other text features that expect a word to be stored as it would be searched on. Option 3: Customize dismax to handle a PrivateField type This would be a custom multi-value poly field that would hold the private owner ID as well as the full text that we want to index. I've not yet done this route but it seems like the cleanest way as far as resulting syntax and index storage. If any of the specified fields are a PrivateField, then it can automatically search the appropriate IDs So when indexing, I can have the PrivateFieldType look something like this: Doc1: privateTags=[{,great client},{3,bad client}] Doc2: privateTags=[{,scott tiger},{3,bad client}] So when I perform a query: http://localhost:8080/usercore/select/?q=clientqoid=qf=firstName%20lastName%20privateTagsdefType=dismax So from the above, I'd want it to search the firstName lastName and privateTags fields. However, I'd want solr to realize that the privateTags are a PrivateFieldType and look for the qoid field - only returning matches the matching ID in the qoid field. So the above query will only return Doc1 because it matches the private tag with and ID of . Thoughts? Ideas?
RE: Suggestions feature
Hi Milan, I have three ideas: 1. Boost by log(weight) instead of just by weight. This would reduce weight-to-weight ratios and so reduce the likelihood of hit list domination, while still retaining the user's relative preferences. Multiple log applications will further decrease the weight-to-weight ratios and thus increase variety. 2. Take the top N topics by weight, thresholding either by some arbitrary weight or at some arbitrary N, and then boost those N topics equally. This would increase variety at the expense of ignoring the user's minor interests. 3. Just apply a fixed boost to all of the user's interests and ignore their associated weights. (This is equivalent to taking strategy #1 to the limit.) Steve -Original Message- From: Milan Dobrota [mailto:mi...@milandobrota.com] Sent: Tuesday, October 04, 2011 12:56 PM To: solr-user@lucene.apache.org Subject: Suggestions feature I am working on a feature similar to Youtube suggestions (where the videos are suggested based on your viewing history). What I do is parse the history and get the user's interests, in the form of weighted topics. When I boost according to those interests, the dominant ones take over the result list. Is there any way to use the boost so that there is some variety in the results? Thanks Milan
Re: http request works, but wget same URL fails
got it. curl http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select/?indent=onq=videofl=name,idwt=csv; works like a champ. On Tue, Oct 4, 2011 at 15:35, Fred Zimmerman w...@nimblebooks.com wrote: This http request works as desired (bringing back a csv file) http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=onversion=2.2q=battleshipwt=csv; but the same URL submitted via wget produces the 500 error reproduced below. I want the wget to download the csv file. What's going on? FredZ bitnami@ip-10-202-202-68:/opt/bitnami/apache2/htdocs/scripts$ --2011-10-04 19:33:41-- http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=on Resolving zimzazsearch3-1.bitnamiapp.com... 75.101.204.213 Connecting to zimzazsearch3-1.bitnamiapp.com|75.101.204.213|:8983... connected. HTTP request sent, awaiting response... 500 null java.lang.NullPointerException \tat java.io.StringReader.init(StringReader.java:33) \tat org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) \tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) \tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) \tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) \tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) \tat org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) \tat org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) \tat org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) \tat org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) \tat org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) \tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) \tat org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) \tat org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) \tat org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) \tat org.mortbay.jetty.Server.handle(Server.java:326) \tat org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) \tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) \tat org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) \tat o 2011-10-04 19:33:41 ERROR 500: null java.lang.NullPointerException \tat java.io.StringReader.init(StringReader.java:33) \tat org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81) \tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) \tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) \tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) \tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) \tat org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) \tat org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) \tat org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) \tat org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) \tat org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) \tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) \tat org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) \tat org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) \tat org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) \tat org.mortbay.jetty.Server.handle(Server.java:326) \tat org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) \tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat
Re: Remove results limit
Hi Andrew, I think this question belongs to the users list more than to the dev's list. Programmatically, it depends on the client library you are using, if you are using SolrJ, it should be something like: SolrQuery query = new SolrQuery(); ... query.setRows(20); query.setStart(40); You can also change the default on the server side, to do this you have to modify the request handler configuration on the solrconfig.xml file, adding something like: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults *int name=rows20/int* /lst /requestHandler I hope this helps, Tomás On Tue, Oct 4, 2011 at 3:19 PM, Andrew Clark andrew.clark.at...@gmail.comwrote: How about programmatically? Is there some config on the server I can change or some API call that changes the default resultset size? On Tue, Oct 4, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.comwrote: Set rows=200 or page through it start=40rows=20 and on the next one start=60rows=20 Best Erick On Tue, Oct 4, 2011 at 1:44 PM, Andrew Clark andrew.clark.at...@gmail.com wrote: I get 193 documents found in my SolrDocumentList, but only 10 of them are returned to me.. how can I remove the 10 document limit? Thanks, Andrew - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Hierarchical faceting with Date
Hi, I'm trying to perform a hierarchical (pivot) faceted search and it doesn't work with date (as one of the field). My questions are 1. Is this a supported feature or just a bug that needs to be addressed? 2. If it is not intended to be supported, what is the complexity involved in implementing it. FYI I'm using a SOLR 4.0 build from 09/30/2011. Regards Ravi Bulusu
Re: SOLR error with custom FacetComponent
Thanks for your response. I could solve my use case with your suggestion. -Ravi Bulusu On Sat, Sep 24, 2011 at 1:51 PM, Ravi Bulusu ravi.b...@gmail.com wrote: Erik, Unfortunately the facet fields are not static. The field are dynamic SOLR fields and are generated by different applications. The field names will be populated into a data store (like memcache) and facets have to be driven from that data store. I need to write a Custom FacetComponent which picks up the facet fields from the data store. Thanks for your response. -Ravi Bulusu Subject: Re: SOLR error with custom FacetComponent From: Erik Hatcher erik.hatcher@... Date: 2011-09-21 18:18 Why create a custom facet component for this? Simply add lines like this to your request handler(s): str name=facet.fieldmanu_exact/str either in defaults or appends sections. Erik On Wed, Sep 21, 2011 at 2:00 PM, Ravi Bulusu ravi.b...@gmail.com wrote: Hi All, I'm trying to write a custom SOLR facet component and I'm getting some errors when I deploy my code into the SOLR server. Can you please let me know what Im doing wrong? I appreciate your help on this issue. Thanks. *Issue* I'm getting an error saying Error instantiating SearchComponent My Custom Class is not a org.apache.solr.handler.component.SearchComponent. My custom class inherits from *FacetComponent* which extends from * SearchComponent*. My custom class is defined as follows… I implemented the process method to meet our functionality. We have some default facets that have to be sent every time, irrespective of the Query request. /** * * @author ravibulusu */ public class MyFacetComponent extends FacetComponent { …. }
Re: UniqueKey filed length exceeds
: if you'd like to make a query like this you need to escape the : so : something like or use the term QParser, which was created for the explicit purpose of never needing to worry about escaping terms in your index... q={!term f=id}2009-11-04:13:51:07.348184 -Hoss
Re: is there any attribute in schema.xml to avoid duplication in solr?
y : i want to know whether is there any attribute in schema.xml to avoid : the duplications? you need to explain your problem better ... duplications can mean differnet things to different people. (duplicate documents? duplicates terms in a field? etc...) please provide a detailed description of your *real* goal and the *speciifcs* of the problems you are encountering... https://wiki.apache.org/solr/UsingMailingLists -Hoss
composite Unique Keys?
I have several different document types that I store. I use a serialized integer that is unique to the document type. If I use id as the uniqueKey, then there is a possibility to have colliding docs on the id, what would be the best way to have a unique id given I am storing my unique identifier in an integer field? I've seen other solutions where the unique is consists of DocTypeString 123245 as the id which seems pretty inefficient to me.
Re: sorting using function query results are notin order
that was it..COunt was not indexed...works fine now -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3394876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sorting using function query results are notin order
: Subject: Re: sorting using function query results are notin order : : that was it..COunt was not indexed...works fine now Hmmm... which version of solr are you using? what exactly is the FieldType of your Count field? Since Solr 3.1, attempting to use a function on a non-indexed field *should* fail with a clear error... https://issues.apache.org/jira/browse/SOLR-2348 -Hoss
Re: how to avoid duplicates in search results?
: There is also a Document Duplicate Detection at index time: : http://wiki.apache.org/solr/Deduplication Of just setting url as your UniqueKey field would solve this simplr usecase. but it's not entirely clear what else you consider duplicates besides this one example. : - doc : str name=descriptiontesting group/str : str name=nametesting group/str : str : name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str : /doc : - doc : str name=descriptiontesting group/str : str name=nametesting group/str : str : name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str : /doc -Hoss
StreamingUpdateSolrServer and commitWithin
Hi, I'm confused about using StreamingUpdateSolrServer and commitWithin parameter in conjuction with waitSearcher and waitFlush. Does it make sense a request like this? UpdateRequest updateRequest = new UpdateRequest(); updateRequest.setCommitWithin(12); updateRequest.setWaitSearcher(false); updateRequest.setWaitFlush(true); updateRequest.add(doc); this.solrServer.request(updateRequest); I'm not sure if setWaitSearcher(false) is being honored, look at my log INFO o.apache.solr.update.UpdateHandler - start commit(optimize=false,waitFlush=true,*waitSearcher=true* ,expungeDeletes=false) My guess is if commitWithin is used we do not have to care about setting waitSearcher or waitFlush. thanks -- Leonardo S Souza
Re: solr 1.4 facet.limit behaviour in merging from several shards
: OK, if SOLR-2403 being related to the bug I described, has been fixed in : SOLR 3.4 than we are safe, since we are in the process of migration. Is it : possible to verify this somehow? Is FacetComponent class is the one I should : start checking this from? Can you give any other pointers? According to Jira, it has not been fixed (note the Unresolved status) FacetComponent is definitely the class that needs fixed. If you are interestedin working on a patch, see Yonik's comments in the issue about how to approach it, in particular about how fixing mincount=1 is definitley solvable even if mincount 1 is a harder/intractable problem. the change sounds simple, but writting test cases to verify it is going to also take some effort. -Hoss
Re: indexing FTP documet with solrj
: I want to index some document with solrj API's but the URL of theses : documents is FTP, : How to set username and password for FTP acount in solrj : : in solrj API there is CommonsHttpSolrServer method but i do not find any : method for FTP configuration it sounds like you are getting ocnfused between using SolrJ to talk to *solr* And using SolrJ to index arbitrary URLs. SolrJ doesn't do any crawling -- if you have data that you want to index then your client code needs to decide what that data is (and where it comes from) and feed that data to SolrJ as documents to index. the only URLs that SolrJ knows about are: * the URL for tlaking to Solr * strings that SolrJ passes to solr as document fields that may just so happen to be URLs (SolrJ doesn't know/care) -Hoss
Re: Any plans to support function queries on score?
: Do you have any plans to support function queries on score field? for : example, sort=floor(product(score, 100)+0.5) desc? You most certianly can conput function queries on the the score of a query, but you have to be explicit about which query you want to use the score of. You seem to already know this... : I can't use subquery in this case because I am trying to use secondary : sorting, however I will be open for that if someone successfully use : another field to boost the results. ...i don't understand your explanation of why you can't specify a subquery to indicate what you want to sort on. a) the subquery can be the exact query you executed (you can even use variable substitution to garuntee it) b) wether you use secondary sorting has no impact on how the function is computed Here's an example using the solr sample data that works great... q=ipodfl=inStock,id,price,scoresort=inStock+desc,+product(price,query($q))+desc -Hoss
Re: SOLR HttpCache Qtime
Still doesn't make sense to me. There is no Solr HTTP cache that I know of. There is a queryResultCache. There is a filterCache. There is a documentCache. There's may even be custom cache implementations. There's a fieldValueCache. There's no http cache internal to Solr as far as I can tell. If you're asking if documents returned from the queryResultCache have QTimes that reflect the actual time spent (near 0), I'm pretty sure the answer is yes. If this doesn't answer your question, please take the time to formulate a complete question. It'll get you your answers quicker than multiple twitter-style exchanges. Best Erick On Tue, Oct 4, 2011 at 2:22 PM, Lord Khan Han khanuniver...@gmail.com wrote: I just want to be sure.. because its solr internal HTTP cache.. not an outside httpcacher On Tue, Oct 4, 2011 at 5:39 PM, Erick Erickson erickerick...@gmail.comwrote: But if the HTTP cache is what's returning the value, Solr never sees anything at all, right? So Solr doesn't have a chance to do anything here. Best Erick On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Han khanuniver...@gmail.com wrote: We are using this Qtime field and publishing in our front web. Even the httpCache decreasing the Qtime in reality, its still using the cached old Qtime value . We can use our internal qtime instead of Solr's but I just wonder is there any way to say Solr if its coming httpCache re-calculate the Qtime. On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.com wrote: Why do you want to? QTime is the time Solr spends searching. The cached value will, indeed, be from the query that filled in the HTTP cache. But what are you doing with that information that you want to correct it? That said, I have no clue how you'd attempt to do this. Best Erick On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, Is there anyway to get correct Qtime when we use http caching ? I think Solr caching also the Qtime so giving the the same Qtime in response what ever takes it to finish .. How I can set Qtime correcly from solr when I use http caching On. thanks
Re: schema changes changes 3.3 to 3.4?
It looks to me like you changed the analysis chain for the field in question by removing stemmers of some sort or other. The quickest way to answer this kind of question is to get familiar with the admin/analysis page (don't forget to check the verbose checkboxes). Enter the term in both the index and query boxes and hit the button. It shows you exactly what parts of the chain performed what actions. So you index analysis chain probably removed the plurals, but your query-side didn't. So I'm guessing that not only didn't it show the metadata, it didn't even find the document in question. But that's a guess at this point. Best Erick On Mon, Oct 3, 2011 at 4:22 PM, jo jairo.or...@firmex.com wrote: Hi, I have the following issue on my test environment when I do a query with the full word the reply no longer contains the attr_meta ex: http://solr1:8983/solr/core_1/select/?q=stegosaurus arr name=attr_content_encoding strISO-8859-1/str /arr arr name=attr_content_language stren/str /arr but if I remove just one letter it shows the expected response ex: http://solr1:8983/solr/core_1/select/?q=stegosauru arr name=attr_content_encoding strISO-8859-1/str /arr arr name=attr_meta strstream_source_info/str strdocument/str strstream_content_type/str strtext/plain/str strstream_size/str str81/str strContent-Encoding/str strISO-8859-1/str strstream_name/str strfilex123.txt/str strContent-Type/str strtext/plain/str strresourceName/str strdinosaurs5.txt/str /arr For troubleshooting I replaced the schema.xml from 3.3 into 3.4 and it work just fine, I can't find what changes on the schema would case this, any clues? -- View this message in context: http://lucene.472066.n3.nabble.com/schema-changes-changes-3-3-to-3-4-tp3391019p3391019.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to obtain the Explained output programmatically ?
If we're talking SolrJ here, I *think* you can get it from the NamedListObject returned from SolrResponse.getResponse, but I confess I haven't tried it. If not SolrJ, what is your plugin doing? And where to you expect your plugin to be in the query process? Best Erick On Mon, Oct 3, 2011 at 5:31 PM, David Ryan help...@gmail.com wrote: Thanks Hoss! debug.explain.structuredhttps://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured is definitely helpful. It adds some structure to the plain explained output. Is there a way to access these structured outputs in Java code (e.g., via Solr plugin class)? We could write a HTML parse to examine the output in the browser, but it's probably no the best way to do that. On Mon, Oct 3, 2011 at 2:11 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : http://localhost:8983/solr/select/?q=GBversion=2.2start=0rows=10indent=ondebugQuery=truefl=id,score ... : the web browser. Is there a way to access the explained output : programmatically? https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured -Hoss
Re: SOLR HttpCache Qtime
Seems to me what you're asking is how to have an accurate query time when you're getting a response that's been cached by an HTTP cache. This might be from the browser, or from a proxy, or from something else, but it's not from Solr. The reason that the QTime doesn't change is because it's the entire response -- results, parameters, Qtime, and all -- that's cached. Solr isn't making a new request; it doesn't even know that a request has been made. So if you do 6 requests, and the last 5 come from the cache, Solr has done only one request, with one Qtime. So it sounds to me that you are looking for the RESPONSE time, which would be different from the QTime, and would, I suppose, come from your application, and not from Solr. Nick On 10/4/2011 7:44 PM, Erick Erickson wrote: Still doesn't make sense to me. There is no Solr HTTP cache that I know of. There is a queryResultCache. There is a filterCache. There is a documentCache. There's may even be custom cache implementations. There's a fieldValueCache. There's no http cache internal to Solr as far as I can tell. If you're asking if documents returned from the queryResultCache have QTimes that reflect the actual time spent (near 0), I'm pretty sure the answer is yes. If this doesn't answer your question, please take the time to formulate a complete question. It'll get you your answers quicker than multiple twitter-style exchanges. Best Erick On Tue, Oct 4, 2011 at 2:22 PM, Lord Khan Hankhanuniver...@gmail.com wrote: I just want to be sure.. because its solr internal HTTP cache.. not an outside httpcacher On Tue, Oct 4, 2011 at 5:39 PM, Erick Ericksonerickerick...@gmail.comwrote: But if the HTTP cache is what's returning the value, Solr never sees anything at all, right? So Solr doesn't have a chance to do anything here. Best Erick On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Hankhanuniver...@gmail.com wrote: We are using this Qtime field and publishing in our front web. Even the httpCache decreasing the Qtime in reality, its still using the cached old Qtime value . We can use our internal qtime instead of Solr's but I just wonder is there any way to say Solr if its coming httpCache re-calculate the Qtime. On Tue, Oct 4, 2011 at 4:16 AM, Erick Ericksonerickerick...@gmail.com wrote: Why do you want to? QTime is the time Solr spends searching. The cached value will, indeed, be from the query that filled in the HTTP cache. But what are you doing with that information that you want to correct it? That said, I have no clue how you'd attempt to do this. Best Erick On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Hankhanuniver...@gmail.com wrote: Hi, Is there anyway to get correct Qtime when we use http caching ? I think Solr caching also the Qtime so giving the the same Qtime in response what ever takes it to finish .. How I can set Qtime correcly from solr when I use http caching On. thanks
Re: sorting using function query results are notin order
hi solr-spec-version 4.0.0.2011.07.19.16.15.08 solr-impl-version 4.0-SNAPSHOT ${svnversion} - ad895d - 2011-07-19 16:15:08 lucene-spec-version 4.0-SNAPSHOT lucene-impl-version 4.0-SNAPSHOT ${svnversion} - ad895d - 2011-07-19 16:15:13 -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3395106.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sorting using function query results are notin order
: solr-spec-version : 4.0.0.2011.07.19.16.15.08 : solr-impl-version : 4.0-SNAPSHOT ${svnversion} - ad895d - 2011-07-19 16:15:08 Uh, ok ... that doesn't really answer my question at all. According to that info, your version of solr was *built* on 2011-07-19, using some snapshot of trunk, but the svn version info is missing, so there is know way of knowing how old that snapshot is. you also didn't answer my question about the field type you are using for Count. details matter. SOLR-2348 was commited to trunk on 2011-02-17 (r1071459). When i tried to reproduce your example using the current trunk, and an un-indexed TrieIntField (i edited popularity in the example schema.xml to make it indexed=false) i got the expected error message instead of silently computing he wrong value when i tried to use popularity in a function... http://localhost:8983/solr/select?q=ipodfl=id,product%28popularity,price%29 So unless your 2011-07-19 build was from a version of svn that was already 4 months old, there may still be something wrong that we should try to get to the bottom of. -Hoss
Scoring of DisMax in Solr
Hi, When I examine the score calculation of DisMax in Solr, it looks to me that DisMax is using tf x idf^2 instead of tf x idf. Does anyone have insight why tf x idf is not used here? Here is the score contribution from one one field: score(q,c) = queryWeight x fieldWeight = tf x idf x idf x queryNorm x fieldNorm Here is the example that I used to derive the formula above. Clearly, idf is multiplied twice in the score calculation. * http://localhost:8983/solr/select/?q=GBversion=2.2start=0rows=10indent=ondebugQuery=truefl=id,score * str name=6H500F0 0.18314168 = (MATCH) sum of: 0.18314168 = (MATCH) weight(text:gb in 1), product of: 0.35845062 = queryWeight(text:gb), product of: 2.3121865 = idf(docFreq=6, numDocs=26) 0.15502669 = queryNorm 0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of: 1.4142135 = tf(termFreq(text:gb)=2) 2.3121865 = idf(docFreq=6, numDocs=26) 0.15625 = fieldNorm(field=text, doc=1) /str Thanks!
DIH full-import with clean=false is still removing old data
Hello, I have a unique dataset of 1,110,000 products, each as its own file. It is split into three different directories as 500,000 and 110,000 files and 500,000. When I run: http://localhost:8983/solr/bbyopen/dataimport?command=full-importclean=falsecommit=true The first 500,000 entries are successfully indexed and then the next 110,000 entries also work ... but after I run the third full-import on the last set of 500,000 entries, the document count remains at 610,000 ... it doesn't go up to 1,110,000! 1) Is there some kind of limit here? Why can the full-import keep the initial 500,000 entries and then let me do a full-import with 110,000 more entries ... but when I try to do a 3rd full-import, the document count doesn't go up. 2) I know for sure that all the data is unique. Since I am not doing delta-imports, I have NOT specified any primary key in the data-import.xml file. But I do have a uniqueKey in the schema.xml file. Any tips? - Pulkit
Re: schema changes changes 3.3 to 3.4?
Interesting... I did not make changes on the default settings, but defenetely will give that a shot.. thanks I will comment later if I found a solution beside replacing the schema with the default one on 3.3 thanks JO -- View this message in context: http://lucene.472066.n3.nabble.com/schema-changes-changes-3-3-to-3-4-tp3391019p3395265.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 facet.limit behaviour in merging from several shards
On Tue, Oct 4, 2011 at 7:13 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : OK, if SOLR-2403 being related to the bug I described, has been fixed in : SOLR 3.4 than we are safe, since we are in the process of migration. Is it : possible to verify this somehow? Is FacetComponent class is the one I should : start checking this from? Can you give any other pointers? According to Jira, it has not been fixed (note the Unresolved status) Boy, I wish jira defaulted to showing the All tab. I *think* this actually has been fixed and I just forgot to close the issue? -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference FacetComponent is definitely the class that needs fixed. If you are interestedin working on a patch, see Yonik's comments in the issue about how to approach it, in particular about how fixing mincount=1 is definitley solvable even if mincount 1 is a harder/intractable problem. the change sounds simple, but writting test cases to verify it is going to also take some effort. -Hoss
Re: sorting using function query results are notin order
hi hoss, I see this in change.txt $Id: CHANGES.txt 1148494 2011-07-19 19:25:01Z hossman $ repository root: http://svn.apache.org/repos/asf/lucene/dev/trunk Revision:1148519 If u let me know what/where to look for i can send details. Here is field defination field name=Count type=sint indexed=true stored=true multiValued=false/ -- View this message in context: http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3395385.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH full-import with clean=false is still removing old data
Bah it worked after cleaning it out for the 3rd time, don't know what I did differently this time :( result name=response numFound=1110983 start=0/ On Tue, Oct 4, 2011 at 8:00 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello, I have a unique dataset of 1,110,000 products, each as its own file. It is split into three different directories as 500,000 and 110,000 files and 500,000. When I run: http://localhost:8983/solr/bbyopen/dataimport?command=full-importclean=falsecommit=true The first 500,000 entries are successfully indexed and then the next 110,000 entries also work ... but after I run the third full-import on the last set of 500,000 entries, the document count remains at 610,000 ... it doesn't go up to 1,110,000! 1) Is there some kind of limit here? Why can the full-import keep the initial 500,000 entries and then let me do a full-import with 110,000 more entries ... but when I try to do a 3rd full-import, the document count doesn't go up. 2) I know for sure that all the data is unique. Since I am not doing delta-imports, I have NOT specified any primary key in the data-import.xml file. But I do have a uniqueKey in the schema.xml file. Any tips? - Pulkit
A simple query?
Hi all, This may seem to be an easy one but I have been struggling to get it working. To simplify things, let's say I have a field that can contain any combination of the 26 alphabetic letters, space delimited: doc myfielda b/myfield /doc doc myfieldb c/myfield /doc doc myfieldx y z/myfield /doc The search term is a list of user specified letters, for exampe: a b y z I would like only the following docs to be returned: 1. Any doc that contains exactly the 4 letters a b y z (sequence not important) 2. Any docs that contains only a subset of the 4 letters a b y z (sequence not important) Note if a doc contains any letters other than the 4 letters a b y z will not qualify. So in this case, only the first doc should be returned. Can some one shed some light here and let me know how to get this working, specifically: 1. What should the field type be (text, text_ws, string...)? 2. How does the query look like? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3395465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR HttpCache Qtime
Solr supports having the browser cache the results. If your client code supports this caching, or your code goes through an HTTP cacher like Squid, it could return a cached page for a query. Is this what you mean? On Tue, Oct 4, 2011 at 4:55 PM, Nicholas Chase nch...@earthlink.net wrote: Seems to me what you're asking is how to have an accurate query time when you're getting a response that's been cached by an HTTP cache. This might be from the browser, or from a proxy, or from something else, but it's not from Solr. The reason that the QTime doesn't change is because it's the entire response -- results, parameters, Qtime, and all -- that's cached. Solr isn't making a new request; it doesn't even know that a request has been made. So if you do 6 requests, and the last 5 come from the cache, Solr has done only one request, with one Qtime. So it sounds to me that you are looking for the RESPONSE time, which would be different from the QTime, and would, I suppose, come from your application, and not from Solr. Nick On 10/4/2011 7:44 PM, Erick Erickson wrote: Still doesn't make sense to me. There is no Solr HTTP cache that I know of. There is a queryResultCache. There is a filterCache. There is a documentCache. There's may even be custom cache implementations. There's a fieldValueCache. There's no http cache internal to Solr as far as I can tell. If you're asking if documents returned from the queryResultCache have QTimes that reflect the actual time spent (near 0), I'm pretty sure the answer is yes. If this doesn't answer your question, please take the time to formulate a complete question. It'll get you your answers quicker than multiple twitter-style exchanges. Best Erick On Tue, Oct 4, 2011 at 2:22 PM, Lord Khan Hankhanuniver...@gmail.com wrote: I just want to be sure.. because its solr internal HTTP cache.. not an outside httpcacher On Tue, Oct 4, 2011 at 5:39 PM, Erick Ericksonerickerickson@gmail.**comerickerick...@gmail.com wrote: But if the HTTP cache is what's returning the value, Solr never sees anything at all, right? So Solr doesn't have a chance to do anything here. Best Erick On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Hankhanuniver...@gmail.com wrote: We are using this Qtime field and publishing in our front web. Even the httpCache decreasing the Qtime in reality, its still using the cached old Qtime value . We can use our internal qtime instead of Solr's but I just wonder is there any way to say Solr if its coming httpCache re-calculate the Qtime. On Tue, Oct 4, 2011 at 4:16 AM, Erick Ericksonerickerickson@gmail.** com erickerick...@gmail.com wrote: Why do you want to? QTime is the time Solr spends searching. The cached value will, indeed, be from the query that filled in the HTTP cache. But what are you doing with that information that you want to correct it? That said, I have no clue how you'd attempt to do this. Best Erick On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Hankhanuniver...@gmail.com wrote: Hi, Is there anyway to get correct Qtime when we use http caching ? I think Solr caching also the Qtime so giving the the same Qtime in response what ever takes it to finish .. How I can set Qtime correcly from solr when I use http caching On. thanks -- Lance Norskog goks...@gmail.com