to prevent number-of-matching-terms in contributing score

2011-11-07 Thread Samarendra Pratap
Hi everyone! We are working on Solr - 3.4. In Short: If my query term matches more than one words I want it to be considered as one match (in a particular field). Details: Our index has a multi-valued field category which contains possible category names of the company. It is entered by the

Re: best way for sum of fields

2011-11-07 Thread stockii
sry. i need the sum of values of the found documents. e.g. the total amount of one day. each doc in index has ist own amount. i try out something with StatsComponent but with 48 Million docs in Index its to slow. - --- System

Re: SolrJ - threading, http clients, connection managers

2011-11-07 Thread pravesh
1) Is it safe to reuse a single _mgr and _client across all 28 cores? both are thread-safe API as per HttpClient specs. You shld go ahead with this. Regds Pravesh -- View this message in context:

Re: to prevent number-of-matching-terms in contributing score

2011-11-07 Thread pravesh
Did you rebuild the index from scratch. Since this is index time factor, you need to build complete index from scratch. Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/to-prevent-number-of-matching-terms-in-contributing-score-tp3486373p3486447.html Sent from

Re: to prevent number-of-matching-terms in contributing score

2011-11-07 Thread Samarendra Pratap
Hi Pravesh, thanks for your reply but I am not asking about the omitNorms (index time parameter) I am asking about how to consider multiple matches of a term in a single field as one during query time. Thanks On Mon, Nov 7, 2011 at 2:48 PM, pravesh suyalprav...@yahoo.com wrote: Did you

Re: to prevent number-of-matching-terms in contributing score

2011-11-07 Thread pravesh
Hi Samar, You can write your custom similarity implementation, and override the /lengthNorm()/ method to return a constant value. Then in your /schema.xml/ specify your custom implementation as the default similarity class. But you need to rebuild your index from scratch for this to come into

Re: best way for sum of fields

2011-11-07 Thread pravesh
I Guess, This has nothing to do with search part. You can post process the search results(I mean iterate through your results and sum it) Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486536.html Sent from the Solr -

Solr Test Framework

2011-11-07 Thread Ronak Patel
Hi, I am trying to write JUnit Test Code for my Solr interaction and while executing I keep getting the following errors. Mind you, I am using JUnit 4.7 and I have been calling super.setUp(). Here is some sample code... /* (non-Javadoc) * @see

Re: best way for sum of fields

2011-11-07 Thread stockii
yes, this way i am using on another part in my application. i hoped, that exists another way to avoid the way over php - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other

Re: best way for sum of fields

2011-11-07 Thread Tanguy Moal
Hi, If you only need to sum over displayed results, go with the post-processing of hits solution, that's fast and easy. If you sum over the whole data set (i.e your sum is not query dependant), have it computed at indexing time, depending on your indexing workflow. Otherwise, (sum over the

Re: best way for sum of fields

2011-11-07 Thread stockii
hi thanks for the big reply ;) i had the idea with the several and small 5M shards too. and i think thats the next step i have to go, because our biggest index grows each day with avg. 50K documents. but make it sense to keep searcher AND updater cores on one big server? i dont want to use

Re: Solr, MultiValues and links...

2011-11-07 Thread Tiernan OToole
That looks promising... Will look into that a bit more. --Tiernan On Sat, Nov 5, 2011 at 4:07 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, MultiValues are guaranteed to be returned in the order they were inserted, so you might be able to do the linking yourself given the results.

Similar documents and advantages / disadvantages of MLT / Deduplication

2011-11-07 Thread Vadim Kisselmann
Hello folks, i have questions about MLT and Deduplication and what would be the best choice in my case. Case: I index 1000 docs, 5 of them are 95% the same (for example: copy pasted blog articles from different sources, with slight changes (author name, etc..)). But they have differences. *Now

Re: TikaEntityProcessor not working?

2011-11-07 Thread kumar8anuj
I tried to do the same but problem still persist and my document is not getting indexed. I am using solr 3.4.0 and it was having tika 0.8 i replaced core and parser jar with the 0.6 but document is not getting indexed. Please help and nothing is coming in my logs related to that. -- View this

Re: best way for sum of fields

2011-11-07 Thread Tanguy Moal
Hi again, Since you have a custom high availability solution over your solr instances, I can't help much I guess... :-) I usually rely on master/slave replication to separate index build and index search processes. The fact is that resources consumption at build time and search time are

Re: TikaEntityProcessor not working?

2011-11-07 Thread Erick Erickson
You have to provide a lot more information about what you're doing. Are you trying to use DIH? the extracting update request handler? What do your config files look like? Please review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Nov 7, 2011 at 8:18 AM, kumar8anuj

size of data replicated

2011-11-07 Thread Xin Li
Hi, there, I am trying to look into the performance impact of data replication on query response time. To get a clear picture, I would like to know how to get the size of data being replicated for each commit. Through the admin UI, you may read a x of y G data is being replicated; however, y is

Re: Aggregated indexing of updating RSS feeds

2011-11-07 Thread Nagendra Nagarajayya
Shaun: You should try NRT available with Solr with RankingAlgorithm here. You should be able to add docs in real time and also query them in real time. If DIH does not retain the old index, you may be able to convert the rss fields to a XML format as needed by Solr and update the docs (make

Re: size of data replicated

2011-11-07 Thread Otis Gospodnetic
Hi Xin, I don't know if you can see this information anywhere in Solr's UI... ... but I know you could see this information using SPM for Solr [1].  I don't have a screenshot handy to show this visually, but it's easy to explain.  One of the SPM for Solr reports shows the index size (in terms

SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:

2011-11-07 Thread OldSkoolMark
Having some trouble clustering my data ... These symptoms are similar to some problems that were fixed last year. Possible regression? Suggestions on how to proceed? Thanks in advance! https://issues.apache.org/jira/browse/SOLR-1883 https://issues.apache.org/jira/browse/SOLR-1404 Nov 7, 2011

Re: Aggregated indexing of updating RSS feeds

2011-11-07 Thread Fred Zimmerman
Any options that do not require adding new software? On Mon, Nov 7, 2011 at 11:11 AM, Nagendra Nagarajayya nnagaraja...@transaxtions.com wrote: Shaun: You should try NRT available with Solr with RankingAlgorithm here. You should be able to add docs in real time and also query them in real

Re: how to achieve google.com like results for phrase queries

2011-11-07 Thread alxsss
Solr also can query link(url) text and rank them higher if we specify url in qf field. Only problem is that why it does not rank pages with both words higher when mm is set as 1lt;-1. It seems to me that this is a bug. Thanks. Alex. -Original Message- From: Ted Dunning

Re: Why Jboss server is stopped due to SOLR

2011-11-07 Thread Chris Hostetter
: I am trying to connect the SOLR with Java code using URLConnection, i have : deployed solr war file in jboss server(assuming server machine in some other : location or remote) its working fine if no exception raises... but if any : exception raises in server like connection failure its stopping

Re: i don't get why this says non-match

2011-11-07 Thread Chris Hostetter
: It looks to me like everything matches down the line but top level says : otherQuery is a non-match... I don't get it? Note the parsed query... : str name=parsedquery_toString+moreWords:syncmaster : -moreWords:sync (master syncmaster)/str ...and the top level explanation message... : str

Re: Return the ranks of selected documents

2011-11-07 Thread Chris Hostetter
: Ideally this means that for a given query, I would like Solr just to return : the ranks of selected unique keys within the results. If i understand you correctly, given a query MY_QUERY and a set of IDs (ID1, ID2, ID3, etc...) you would like to know the score of those IDs against that query?

Re: Aggregated indexing of updating RSS feeds

2011-11-07 Thread sbarriba
Thanks Nagendra, I'll take a look. So question for you et al, so Solr in its default installation will ALWAYS delete content for an entity prior to doing a full import? You cannot simply build up an index incrementally from multiple imports (from XML)? I read elsewhere that the 'clean' parameter

Re: How to use an External Database for Fields?

2011-11-07 Thread Draconissa
Chris Hostetter-3 wrote: Agreed. In 3.x and below this type of logic is expected to live in the QueryResponseWriters. Forgive my ignorance, but where do QueryResponseWriters live? And where do they fit into the flow? I know how the different components fit into a distributed search,

Solr's JMX domain names

2011-11-07 Thread Otis Gospodnetic
Hello, While working on our Performance Monitoring SaaS for Solr [1] we've noticed Solr MBeans are registered under a different JMX domain name, depending on the Solr version and on the servlet container.  In some cases the domain name is solr, while in others it is solr/.  But we also saw

Re: question from a beginner

2011-11-07 Thread Chris Hostetter
: So for example, if searching on Santa Clara I would like to display all : sections/paragraphs where Santa Clara occurs in the document. can you clarify what you mean by display and how you intend to use that info? it may be obvious to you what you mean by display but depending on the

Re: Solr's JMX domain names

2011-11-07 Thread Chris Hostetter
: depending on the Solr version and on the servlet container.  In some : cases the domain name is solr, while in others it is solr/.  But we : also saw further inconsistencies.  For example, we have 2 Solr 1.4.0 : instances on the same version of the servlet container, and one has : solr,

Re: InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-07 Thread Chris Hostetter
: finally I want to use Solr highlighting. But there seems to be a problem : if I combine the char filter and the compound word filter in combination : with highlighting (an : org.apache.lucene.search.highlight.InvalidTokenOffsetsException is : raised). Definitely sounds like a bug somwhere

Re: overwrite=false support with SolrJ client

2011-11-07 Thread Chris Hostetter
: I see that https://issues.apache.org/jira/browse/SOLR-653 removed this : support from SolrJ, because it was deemed too dangerous for mere : mortals. I believe the concern was that the novice level API was very in your face about asking if you wanted to overwrite and made it too easy to

Re: Term frequency question

2011-11-07 Thread Chris Hostetter
: ./NoLengthNormAndTfSimilarity.java:7: error: lengthNorm(String,int) in : NoLengthNormAndTfSimilarity cannot override lengthNorm(String,int) in : Similarity : public float lengthNorm(String fieldName, int numTerms) { : ^ : overridden method is final : 1 error : - : What am I

Faceting a multi valued field

2011-11-07 Thread Steve Fatula
So, I have a bunch of products indexed in Solr. Each product may exist in any number of product categories. The product category field is therefore multivalued in Solr. This allow us to show categories a product exists in. Now, instead we want to browse the products by category. This also works

Re: changing omitNorms on an already built index

2011-11-07 Thread Jonathan Rochkind
On 10/27/2011 9:14 PM, Erick Erickson wrote: Well, this could be explained if your fields are very short. Norms are encoded into (part of?) a byte, so your ranking may be unaffected. Try adding debugQuery=on and looking at the explanation. If you've really omitted norms, I think you should see

Replication fails in SolrCloud

2011-11-07 Thread prakash chandrasekaran
hi all, i followed steps in link http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble and created Two shard cluster with shard replicas and zookeeper ensemble, and then for Solr Replication i followed steps in link

when using group=true facet numbers are incorrect

2011-11-07 Thread Greg Pelly
Hi, I've noticed that when field collapsing and faceting are both used in the one query the facet numbers ignore the grouping. In my example I have three documents (I have a small index for testing) and if I group on a certain field I get two groups in the results but the facet numbers show that

Re: Faceting a multi valued field

2011-11-07 Thread Chris Hostetter
: Now, instead we want to browse the products by category. This also works : since we can simply find all products for category A. So, we show them. : Now, we also want to show a list of categories underneath (in the tree : structure) that category, and, a count of items in each. Just the :

Re: Aggregated indexing of updating RSS feeds

2011-11-07 Thread Chris Hostetter
: We've successfully setup Solr 3.4.0 to parse and import multiple news : RSS feeds (based on the slashdot example on : http://wiki.apache.org/solr/DataImportHandler) using the HttpDataSource. : The objective is for Solr to index ALL news items published on this feed : (ever) - not just the

Re: Faceting a multi valued field

2011-11-07 Thread Steve Fatula
From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Steve Fatula compconsult...@yahoo.com Sent: Monday, November 7, 2011 5:42 PM Subject: Re: Faceting a multi valued field how are you modeling the tree nature of your cateory taxonomy when

Re: to prevent number-of-matching-terms in contributing score

2011-11-07 Thread Chris Hostetter
: You can write your custom similarity implementation, and override the : /lengthNorm()/ method to return a constant value. The postered already said (twice!) that they have already set omitNorms=true, so lengthNorm won't even be used omiting norms (or mucking with norms by modifying the

Re: Faceting a multi valued field

2011-11-07 Thread Chris Hostetter
: Someone always wants to understand the full use case. :-) I do : understand why, but, sometimes said use case is extremely complex with : dozens and dozens of search requirements. I was trying to limit the : explanation and was hoping someone could just answer the question as is. well -- i

Re: when using group=true facet numbers are incorrect

2011-11-07 Thread Chris Hostetter
: I understand that's a valid thing for faceting to do, I was just wondering : if there's any way to get it to do the faceting on the groups returned. : Otherwise I guess I'll need to convince the UI people to just show the : facets without the numbers. what you are asking about is generally

Re: when using group=true facet numbers are incorrect

2011-11-07 Thread Yonik Seeley
On Mon, Nov 7, 2011 at 8:55 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I understand that's a valid thing for faceting to do, I was just wondering : if there's any way to get it to do the faceting on the groups returned. : Otherwise I guess I'll need to convince the UI people to just

Weird: Solr Search result and Analysis Result not match?

2011-11-07 Thread Ellery Leung
Hi all. I am using Solr 3.4 under Win 7. In schema there is a multivalue field indexed in this way: == Schema: == field name=myEvent type=myCustomText multiValued=true indexed=true stored=true omitNorms=true/ fieldType

can't determine sort order with desc provided

2011-11-07 Thread Greg Pelly
Hi, I'm having an issue with sorting because the PHP plugin converts the + to %2B, I get the error Can't determine Sort Order: 'name+desc'. Thanks in advance for any assistance. Cheers Nov 8, 2011 1:53:00 PM org.apache.solr.core.SolrCore execute INFO: [pending] webapp=/solr path=/select/

Re: can't determine sort order with desc provided

2011-11-07 Thread Chris Hostetter
: I'm having an issue with sorting because the PHP plugin converts the + to : %2B, I get the error Can't determine Sort Order: 'name+desc'. then it sounds like the PHP library you are using is URL escaping things properly, and you should just be passing a simple space character to it. the

Re: when using group=true facet numbers are incorrect

2011-11-07 Thread Greg Pelly
That works well, thanks very much. On Tue, Nov 8, 2011 at 12:55 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I understand that's a valid thing for faceting to do, I was just wondering : if there's any way to get it to do the faceting on the groups returned. : Otherwise I guess I'll

Re: can't determine sort order with desc provided

2011-11-07 Thread Greg Pelly
Thanks again On Tue, Nov 8, 2011 at 2:56 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I'm having an issue with sorting because the PHP plugin converts the + to : %2B, I get the error Can't determine Sort Order: 'name+desc'. then it sounds like the PHP library you are using is URL

Re: Faceting a multi valued field

2011-11-07 Thread Steve Fatula
From: Chris Hostetter hossman_luc...@fucit.org To: Steve Fatula compconsult...@yahoo.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Monday, November 7, 2011 7:17 PM Subject: Re: Faceting a multi valued field : A B C D E : Z C D E : Z C F G H E : Y G H E : :