Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Simon Willnauer
On Thu, Aug 2, 2012 at 7:53 AM, roz dev rozde...@gmail.com wrote: Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit

Re: SOLR 3.4 GeoSpatial Query Returning distance

2012-08-02 Thread Michael Kuhlmann
On 02.08.2012 01:52, Anand Henry wrote: Hi, In SOLR 3.4, while doing a geo-spatial search, is there a way to retrieve the distance of each document from the specified location? Not that I know of. What we did was to read and parse the location field on client side and calculate the distance

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Laurent Vaills
Hi everyone, Is there any chance to get his backported for a 3.6.2 ? Regards, Laurent 2012/8/2 Simon Willnauer simon.willna...@gmail.com On Thu, Aug 2, 2012 at 7:53 AM, roz dev rozde...@gmail.com wrote: Thanks Robert for these inputs. Since we do not really Snowball analyzer for this

Re: AW: auto completion search with solr using NGrams in SOLR

2012-08-02 Thread aniljayanti
Hi, thanks, im searching with empname filed. want to search with both empname and title. below is my changed code. fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter

matching with whole field

2012-08-02 Thread elisabeth benoit
Hello, I am using Solr 3.4. I'm trying to define a type that it is possible to match with only if request contains exactly the same words. Let's say I have two different values for ONLY_EXACT_MATCH_FIELD ONLY_EXACT_MATCH_FIELD: salon de coiffure ONLY_EXACT_MATCH_FIELD: salon de coiffure pour

Re: matching with whole field

2012-08-02 Thread Chantal Ackermann
Hi Elisabeth, try adding the same tokenizer chain for query, as well, or simply remove the type=index from the analyzer element. Your chain is analyzing the input of the indexer and removing diacritics and lowercasing. With your current setup, the input to the search is not analyzed likewise

Re: matching with whole field

2012-08-02 Thread elisabeth benoit
Hello Chantal, Thanks for your answer. In fact, my analyzer contains the same tokenizer chain for query. I just removed it in my email for lisibility (but maybe not good for clarity). And I did check with the admin interface, and it says there is a match. But with a real query to Solr, it

Re: matching with whole field

2012-08-02 Thread fbrisbart
It's a parsing problem. You must tell the query parser to consider spaces as real characters. This should work (backslashing the spaces): fq=ONLY_EXACT_MATCH_FIELD:salon\ de\ coiffure or you may use something like that : fq={!term f=ONLY_EXACT_MATCH_FIELD v=$qq}qq=salon de coiffure Hope it

Re: Solr TermsComponent: space in term

2012-08-02 Thread aniljayanti
Hi Im working on autocompelte functionality in solr. can u suggest me the required configurations in schema.xml and solrconfig.xml for doing autocomplete in solr ?? thanks in advance, Anil -- View this message in context:

Re: SOLR 3.4 GeoSpatial Query Returning distance

2012-08-02 Thread Tanguy Moal
Hi, I've not tested it by myself but I think that can take advantage of Solr 4's pseudo fields, by adding something like : fl=*,geodist(),score I think you could even pass several geodist() calls with different parameters if you want to have the distance wrt several POIs ^-^ SOLR 4 only. --

AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-02 Thread Markus Klose
If you want to search in the two fields title and empname you have to use the query parser (e)dismax http://wiki.apache.org/solr/ExtendedDisMax you need to specify the qf param: qf=title empname check your solrconfig.xml to verifiy which queryparser you are using right now. In your usecase you

Re: matching with whole field

2012-08-02 Thread elisabeth benoit
Thanks you so much Franck Brisbart. It's working! Best regards, Elisabeth 2012/8/2 fbrisbart fbrisb...@bestofmedia.com It's a parsing problem. You must tell the query parser to consider spaces as real characters. This should work (backslashing the spaces): fq=ONLY_EXACT_MATCH_FIELD:salon\

split on white space and then EdgeNGramFilterFactory

2012-08-02 Thread Rajani Maski
Hi, I wanted to do split on white space and then apply EdgeNGramFilterFactory. Example : A field in a document has text content : smart phone, i24 xpress exchange offer, 500 dollars smart s sm sma smar smart phone p ph pho phon phone i24 i i2 i24 xpress x xp xpr xpre xpres xpress so

Re: split on white space and then EdgeNGramFilterFactory

2012-08-02 Thread Jack Krupansky
Only do the ngram filter at index time. So, add a query-time analyzer to that field type but without the ngram filter. Also, add debugQuery to your query request to see what Lucene query is generated. And, use the Solr admin analyzer to validate both index-time and query-time analysis of

Re: BitSet field type in solr

2012-08-02 Thread Erick Erickson
There has been talk of a bit fieldType, but I don't think it has ever been implemented. I don't think you need a custom FieldType, there already is a BinaryType (although I confess I haven't used it, so check first G). From there, I think your custom Similarity is the way to go... Best Erick On

Re: termFrequncy off and still use fastvector highlighter?

2012-08-02 Thread Erick Erickson
what do you expect to gain by turning off TF? This feels a bit like an XY problem Best Erick On Wed, Aug 1, 2012 at 8:43 AM, abhayd ajdabhol...@hotmail.com wrote: hi We would like to turn off TF for a field but we still want to use fast vector highlighter. How would we do that? --

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Robert Muir
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills laurent.vai...@gmail.com wrote: Hi everyone, Is there any chance to get his backported for a 3.6.2 ? Hello, I personally have no problem with it: but its really technically not a bugfix, just an optimization. It also doesnt solve the actual

Re: Special suggestions requirement

2012-08-02 Thread Michael Della Bitta
In this case, we're storing the overall value length and sorting it on that, then alphabetically. Also, how are your queries fashioned? If you're doing a prefix query, everything that matches it should score the same. If you're only doing a prefix query, you might need to add a term for exact

Re: termFrequncy off and still use fastvector highlighter?

2012-08-02 Thread abhayd
So we have some content where document title is like this Accessory for iphone, iphone4, iphone 4s So these one come on top results for iphone. This could be content authoring issue. But we are looking into avoiding such content to come on top. -- View this message in context:

Solr 4.0 - Join performance

2012-08-02 Thread Eric Khoury
Hello all, I’m testing out the new join feature, hitting some perf issues, as described in Erick’s article (http://architects.dzone.com/articles/solr-experimenting-join). Basically, I’m using 2 objects in solr (this is a simplified view): Item - Id - Name Grant - ItemId -

Re: termFrequncy off and still use fastvector highlighter?

2012-08-02 Thread Tanguy Moal
If think you could use a field without the term frequencies for searching, that will solve your relevancy issues. You can then have the exact same content in an other field (using a copyField directive in your schema), having terms frequencies and positions turned on, and use this particuliar for

SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Just starting to get into SolrCloud using 4.0.0-ALPHA and am very impressed so far ... I have a 12-shard index with ~104M docs with each shard having 1-replica (so 24 Solr servers running) Using the Query form on the Admin panel, I issue the MatchAllDocsQuery (*:*) and each time I send the

growth estimates

2012-08-02 Thread Jeff Minelli
Is there a tool or method to help me calculate the growth of solr disk usage based on the known size of data input into it? Thanks, -jeff

Re: growth estimates

2012-08-02 Thread Rafał Kuć
Hello! Take a look at http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls It should be handy. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Is there a tool or method to help me calculate the growth of

Re: growth estimates

2012-08-02 Thread Jeff Minelli
Awesome, thanks! -jeff On Aug 2, 2012, at 11:32 AM, Rafał Kuć wrote: Hello! Take a look at http://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls It should be handy. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

what's the cummunication protocol between the shards for a shard reqeust?

2012-08-02 Thread Joey
for example we have two shards: shard1 and shard2. our shard request always goes to shard1, wondering what's the protocol when shard sends request to shard2? is it http. in binary format? we are trying to set up appdynamics to monitor the shards but looks like appdynamic could not instrument the

weired shards search problem

2012-08-02 Thread Joey
we have two shards -shard1 and shard2 - each shard has two slaves. And there is a VIP and LB in front of each set of the slaves. The shard request return a SolrDocumentList object, (for same request)SOMETIMES the getNumFound of this object return correct data(say 3) but the actual doccument

Re: weired shards search problem

2012-08-02 Thread Timothy Potter
I've seen this when I wasn't using string type for my document ID (uniqueKey field) ... On Thu, Aug 2, 2012 at 10:13 AM, Joey vanjo...@gmail.com wrote: we have two shards -shard1 and shard2 - each shard has two slaves. And there is a VIP and LB in front of each set of the slaves. The shard

Solr admin stops working

2012-08-02 Thread Niall
I've got Solr 3.6 up working with Jetty but the admin page is inaccessible and Solr appears to stop working when I terminate my SSH connection to the server after running start.jar. Am I missing a trick here: how do I keep it running? -- View this message in context:

Re: Sorting fields of text_general fieldType

2012-08-02 Thread Anupam Bhattacharya
The approach used to work perfectly. But recently i realized that it is not working for more than 30 indexed records. I am using SOLR 3.5 version. Is there another approach to SORT a title field in proper alphabetical order irrespective of Lower case and Upper case. Regards Anupam On Thu,

Re: Solr admin stops working

2012-08-02 Thread Brendan Grainger
I assume you're backgrounding solr. Maybe you just need disown %1 Brendan On Aug 2, 2012, at 1:04 PM, Niall n...@neildoyle.com wrote: I've got Solr 3.6 up working with Jetty but the admin page is inaccessible and Solr appears to stop working when I terminate my SSH connection to the server

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Mark Miller
On Aug 2, 2012, at 11:08 AM, Timothy Potter thelabd...@gmail.com wrote: Just starting to get into SolrCloud using 4.0.0-ALPHA and am very impressed so far ... I have a 12-shard index with ~104M docs with each shard having 1-replica (so 24 Solr servers running) Using the Query form on

Re: Map Complex Datastructure with Solr

2012-08-02 Thread Alexandre Rafalovitch
You are not going to get nested entries. So, your sample result is not possible. Perhaps you just need to flatten your searchable fields into individual article entries and then use a separate DB query to get the product information back out of the database. SOLR is not a database, even a NoSQL

RE: Solr upgrade from 1.4 to 3.6

2012-08-02 Thread Manepalli, Kalyan
Chantal, Thanks for the reply. I will try it out. Thanks, Kalyan Manepalli -Original Message- From: Chantal Ackermann [mailto:c.ackerm...@it-agenten.com] Sent: Wednesday, August 01, 2012 3:55 AM To: solr-user@lucene.apache.org Subject: Re: Solr upgrade from 1.4 to 3.6 Hi

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Thanks Mark. I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer: CollectionSolrInputDocument batch = ... ... build up batch ... solrServer.add( batch ); Basically, I have a custom Pig StoreFunc that sends docs to Solr from our Hadoop analytics nodes. The reason I'm not using SolrJ

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Sorry, I didn't answer your other questions about shards being in-sync. Yes - all are green and happy according to the Cloud admin panel. Tim On Thu, Aug 2, 2012 at 12:16 PM, Timothy Potter thelabd...@gmail.com wrote: Thanks Mark. I'm actually using SolrJ 3.4.0, so using

Re: Map Complex Datastructure with Solr

2012-08-02 Thread Mikhail Khludnev
Tomas, If you mean something like http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html you can check proposed Solr integration https://issues.apache.org/jira/browse/SOLR-3076 Regards On Thu, Aug 2, 2012 at 9:53 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: You

Re: Solr 4.0 - Join performance

2012-08-02 Thread Mikhail Khludnev
Hello, You can check my record. https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13415644page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13415644 I'm still working on precise performance measurement. On Thu, Aug 2, 2012 at 6:45 PM, Eric Khoury

Re: Trending topics?

2012-08-02 Thread Tor Henning Ueland
On Thu, Aug 2, 2012 at 5:34 PM, Chris Dawson xrdaw...@gmail.com wrote: How would I generate a list of trending topics using solr? By putting them in solr. (Generic question get at generic answer) What do you mean? Trending searches, trending data, trending documents, trending what? --

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Mark Miller
Can you do me a favor and try not using the batch add for a run? Just do the add one doc at a time. (solrServer.add(doc) rather than solrServer.add(collection)) I just fixed one issue with it this morning on trunk - it may be the cause of this oddity. I'm also working on some performance

Re: Solr 4.0 - Join performance

2012-08-02 Thread Mikhail Khludnev
Eric, you can take last patch from SOLR-3076 [image: Text File] https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch SOLR-3076.patch https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch 16/Jul/12 21:16 also can take it applied from

Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-02 Thread david3s
Hello Jack, We found that the problem is related to the *lucene* query parser in 3.6 select?q=author:David\ DukedefType=lucene Would render the same results as: select?q=author:(David OR Duke)defType=lucene But select?q=author:David\ DukedefType=edismax Would render the same results as:

Re: Trending topics?

2012-08-02 Thread Chris Dawson
Tor, Thanks for your response. I'd like to put an arbitrary set of text into Solr and then have Solr tell me the ten most popular topics that are in there. For example, if I put in 100 paragraphs of text about sports, I would like to retrieve topics like swimming, basketball, tennis if the

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Timothy Potter
Yes, I can but won't get to it today unfortunately. I had my eval environment running on some very expensive EC2 instances and shut it down for the time being until I can focus on it again. Will try to get back to this either tomorrow or over the weekend. Sorry for the delay. Tim On Thu, Aug 2,

How do you get the document name from Open Text?

2012-08-02 Thread eShard
I'm using Solr 4.0 with ManifoldCF .5.1 crawling Open Text v10.5. I have the cats/atts turned on in Open Text and I can see them all in the Solr index. However, the id is just the URL to download the doc from open text and the document name either from Open Text or the document properties is

Re: Trending topics?

2012-08-02 Thread Lance Norskog
Two easy ones: 1) Facets on a text field are simple word counts by document. 2) If you want the number of times a word appears inside a document, that requires a separate dataset called a 'term vector'. This is a list of all words in a document with a count for each one. These are simple queries.

Re: SolrCloud MatchAllDocsQuery returning different number of docs each request

2012-08-02 Thread Mark Miller
FYI: I've committed the rest of the work I was doing on trunk in this area. On Aug 2, 2012, at 4:42 PM, Timothy Potter thelabd...@gmail.com wrote: Yes, I can but won't get to it today unfortunately. I had my eval environment running on some very expensive EC2 instances and shut it down for

AW: Special suggestions requirement

2012-08-02 Thread Lochschmied, Alexander
Even with prefix query, I do not get ABCD02 or any ABCD02... back. BTW: EdgeNGramFilterFactory is used on the field we are getting the suggestions/spellchecks from. I think the problem is that there are a lot of different part numbers starting with ABCD and every part number has the same

Re: Trending topics?

2012-08-02 Thread Hasan Diwan
Tor, I hope that the information in http://www.jason-palmer.com/2011/05/creating-a-tag-cloud-with-solr-and-php/ helps.. -- H On 2 August 2012 15:48, Lance Norskog goks...@gmail.com wrote: Two easy ones: 1) Facets on a text field are simple word counts by document. 2) If you want the number of

synonym file

2012-08-02 Thread Peyman Faratin
Hi I have a (23M) synonym file that takes a long time (3 or so minutes) to load and once included seems to adversely affect the QTime of the application by approximately 4 orders of magnitude. Any advise on how to load faster and lower the QT would be much appreciated. best Peyman

Re: synonym file

2012-08-02 Thread Lance Norskog
If you must have them a query time, you need a custom implementation for very very large files :) If you can use these synonyms at index time instead of query time, that would help. When you index, do not call commit very often. The synonym filter implementation has a feature where it only saves