Re: Highlighting stopwords

2012-02-14 Thread O. Klein

O. Klein wrote
 
 Hmm, now the synonyms aren't highlighted anymore.
 
 OK back to basic (im using trunk and FVH).
 
 What is the way to go about if I want to search on a field without
 stopwords, but still want to highlight the stopwords? (and still highlight
 synonyms and stemmed words)?
 

I made new field content_hl to prevent problems coming from copyField.

When using hl.q=content_hl:(spell Check) I now get highlighting including
stopwords.

but when using hl.q=content_hl:(SC) where SC is synonym I get no
highlighting.

Can you verify if synonyms work when using hl.q?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3743317.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Improving performance for SOLR geo queries?

2012-02-14 Thread Matthias Käppler
hey thanks all for the suggestions, didn't have time to look into them
yet as we're feature-sprinting for MWC, but will report back with some
feedback over the next weeks (we will have a few more performance
sprints in March)

Best,
Matthias

On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley yo...@lucidimagination.com 
 wrote:
 One way to speed up numeric range queries (at the cost of increased
 index size) is to lower the precisionStep.  You could try changing
 this from 8 to 4 and then re-indexing to see how that affects your
 query speed.

 Your issue, and the fact that I had been looking at the post-filtering
 code again for another client, reminded me that I had been planning on
 implementing post-filtering for spatial.  It's now checked into trunk.

 If you have the ability to use trunk, you can add a high cost (like
 cost=200) along with cache=false to trigger it.

 More details here:
 http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/

 -Yonik
 lucidimagination.com



-- 
Matthias Käppler
Lead Developer API  Mobile

Qype GmbH
Großer Burstah 50-52
20457 Hamburg
Telephone: +49 (0)40 - 219 019 2 - 160
Skype: m_kaeppler
Email: matth...@qype.com

Managing Director: Ian Brotherston
Amtsgericht Hamburg
HRB 95913

This e-mail and its attachments may contain confidential and/or
privileged information. If you are not the intended recipient (or have
received this e-mail in error) please notify the sender immediately
and destroy this e-mail and its attachments. Any unauthorized copying,
disclosure or distribution of this e-mail and  its attachments is
strictly forbidden. This notice also applies to future messages.


Re: Language specific tokenizer for purpose of multilingual search in single-core solr,

2012-02-14 Thread Paul Libbrecht
only one field element?
There should be two or?
One for each language.

paul


Le 14 févr. 2012 à 07:34, bing a écrit :

 
 Hi, all, 
 
 I want to do multilingual search in single-core solr. That requires to
 define language specific tokenizers in scheme.xml. Say for example, I have
 two tokenizers, one for English (en) and one for simplified Chinese
 (zh-cn). Can I just put following definitions together in one schema.xml,
 and both sets of the files ( stopwords, synonym, and protwords) in one
 directory? 
 
 
 1. fieldType and field definition for english (en)  
 
 fieldType name=text_en class=solr.TextField
 positionIncrementGap=100
  analyzer type=index language=en
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_en.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
 protected=protwords_en.txt/
  /analyzer
  .
 /fieldType
 
 field name=text_en type=text_en indexed=true stored=false
 multiValued=true/
 
 
 2. fieldType and field definition for Chinese (zh_cn)  
 
 fieldType name=text_zh_ch class=solr.TextField
 positionIncrementGap=100
  analyzer type=index language=zh_cn
tokenizer class=org.wltea.analyzer.solr.IKTokenizerFactory//
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_ch.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
 protected=protwords_en.txt/
  /analyzer
  .
 /fieldType
 
 field name=text_zh_cn type=text_zh_cn indexed=true stored=false
 multiValued=true/
 
 
 Best 
 Bing
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Language-specific-tokenizer-for-purpose-of-multilingual-search-in-single-core-solr-tp3742873p3742873.html
 Sent from the Solr - User mailing list archive at Nabble.com.



sort my results alphabetically on facetnames

2012-02-14 Thread PeterKerk
I want to sort my results on the facetnames (not by their number of results).

So now I have this (ordered by number of results):
Instelling voor auditief gehandicapten (16)
Audiologisch centrum (13)
Huisartsenpraktijk (13)
Instelling voor lichamelijk gehandicapten (13)
Ambulancezorg (12)
Beroepsorganisatie (12)

What I want is this:
Ambulancezorg (12)
Audiologisch centrum (13)
Beroepsorganisatie (12)
Huisartsenpraktijk (13)
Instelling voor auditief gehandicapten (16)
Instelling voor lichamelijk gehandicapten (13)

How can I change my request url to sort differently?

My current request url is like so:
http://localhost:8983/solr/zz_healthorg/select/?indent=onfacet=trueq=*:*fl=idfacet.field=healthorganizationtypes_raw_nlfacet.mincount=1
With the resul below
This XML file does not appear to have any style information associated with
it. The document tree is shown below.
response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
lst name=params
str name=facettrue/str
str name=flid/str
str name=indenton/str
str name=facet.mincount1/str
str name=q*:*/str
str name=facet.fieldhealthorganizationtypes_raw_nl/str
/lst
/lst
result name=response numFound=258 start=0
doc
str name=id1/str
/doc
doc
str name=id2/str
/doc
doc
str name=id3/str
/doc
doc
str name=id4/str
/doc
doc
str name=id5/str
/doc
doc
str name=id6/str
/doc
doc
str name=id7/str
/doc
doc
str name=id8/str
/doc
doc
str name=id9/str
/doc
doc
str name=id10/str
/doc
/result
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=healthorganizationtypes_raw_nl
int name=Instelling voor auditief gehandicapten16/int
int name=Audiologisch centrum13/int
int name=Huisartsenpraktijk13/int
int name=Instelling voor lichamelijk gehandicapten13/int
int name=Ambulancezorg12/int
int name=Beroepsorganisatie12/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst
/response

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-my-results-alphabetically-on-facetnames-tp3743471p3743471.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re:how to monitor solr in newrelic

2012-02-14 Thread roySolr
Try this when you start SOLR

java -javaagent:/NEWRELICPATH/newrelic.jar -jar start.jar

Normally you will see your SOLR installation on your newrelic dashboard in 2
minutes.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-monitor-solr-in-newrelic-tp3739567p3743488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort my results alphabetically on facetnames

2012-02-14 Thread Michael Kuhlmann

Hi!

On 14.02.2012 13:09, PeterKerk wrote:

I want to sort my results on the facetnames (not by their number of results).


From the example you gave, I'd assume you don't want to sort by facet 
names but by facet values.


Simply add facet.sort=index to your request; see
http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

Or simply sort the facet result on your own.

Greetings,
Kuli


Re: Highlighting stopwords

2012-02-14 Thread O. Klein

O. Klein wrote
 
 
 O. Klein wrote
 
 Hmm, now the synonyms aren't highlighted anymore.
 
 OK back to basic (im using trunk and FVH).
 
 What is the way to go about if I want to search on a field without
 stopwords, but still want to highlight the stopwords? (and still
 highlight synonyms and stemmed words)?
 
 
 I made new field content_hl to prevent problems coming from copyField.
 
 When using hl.q=content_hl:(spell Check) I now get highlighting including
 stopwords.
 
 but when using hl.q=content_hl:(SC) where SC is synonym I get no
 highlighting.
 
 Can you verify if synonyms work when using hl.q?
 

OK I got it working by using hl.q=content_hl:(spell Check)
content_text:(spell Check) but it makes no sense to me.

only difference between the 2 fields is the use of Stopwords.

What's also weird is that a query like hl.q=content_spell:(SC) also
highlights synonyms, eventhough this field has no synonyms.

I have not been able to find any logic in the behavior of hl.q and how it
analyses the query. Could you explain how it is supposed to work?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3743616.html
Sent from the Solr - User mailing list archive at Nabble.com.


'foruns' don't match 'forum' with NGramFilterFactory (or EdgeNGramFilterFactory)

2012-02-14 Thread Bráulio Bhavamitra
Hello all,

I'm experimenting with NGramFilterFactory and EgdeNGramFilterFactory.

Both of them shows a match in my solr admin analysis, but when I query
'foruns'
doesn't find any 'forum'.
analysis
http://bhakta.casadomato.org:8982/solr/admin/analysis.jsp?nt=typename=textverbose=onhighlight=onval=f%C3%B3runsqverbose=onqval=f%C3%B3runs
search
http://bhakta.casadomato.org:8982/solr/select/?q=forunsversion=2.2start=0rows=10indent=on

Anybody knows what's the problem?

bráulio


Stemming and accents (HunspellStemFilterFactory)

2012-02-14 Thread Bráulio Bhavamitra
Hello all,

I'm evaluating the HunspellStemFilterFactory I found it works with a
pt_PT dictionary.

For example, if I search for 'fóruns' it stems it to 'fórum' and then find
'fórum' references.

But if I search for 'foruns' (without accent),
then HunspellStemFilterFactory cannot stem
word, as it does' not exist in its dictionary.

It there any way to make HunspellStemFilterFactory work without accents
differences?

best,
bráulio


Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Has there been any success in replicating this?  I'm wondering if it
could be something with my setup that is causing the issue...


On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:


Debugging on 3,5

2012-02-14 Thread Bill Bell

I did find a solution, but the output is horrible. Why does explain look so 
badly?

lst name=explainstr name=2H7DF
6.351252 = (MATCH) boost(*:*,query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)), product of:
  1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm
  6.351252 = query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)=6.351252
/str


defType=edismaxboost=query($param)param=multi_field:87
--


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we get 
the multi-valued field issue when we try to do this.

Bill Bell
Sent from mobile



Re: Solr binary response for C#?

2012-02-14 Thread Erick Erickson
It's not as compact as binary format, but would just using something
like JSON help enough? This is really simple, just specify
wt=json (there's a method to set this on the server, at least in Java).

Otherwise, you might get a more knowledgeable response on the
C# java list, I'm frankly clueless

Best
Erick

On Mon, Feb 13, 2012 at 1:15 PM, naptowndev naptowndev...@gmail.com wrote:
 Admittedly I'm new to this, but the project we're working on feeds results
 from Solr to an ASP.net application.  Currently we are using XML, but our
 payloads can be rather large, some up to 17MB.  We are looking for a way to
 minimize that payload and increase performance and I'm curious if there's
 anything anyone has been working out that creates a binary response that can
 be read by C# (similar to the javabin response built into Solr).

 That, or if anyone has experience implementing an external protocol like
 Thrift with Solr and consuming it with C# - again all in the effort to
 increase performance across the wire and while being consumed.

 Any help and direction would be greatly appreciated!

 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-binary-response-for-C-tp3741101p3741101.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Mmap

2012-02-14 Thread Bill Bell
Does someone have an example of using unmap in 3.5 and chunksize?

 I am using Solr 3.5.

I noticed in solrconfig.xml:

directoryFactory name=DirectoryFactory 
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

I don't see this parameter taking.. When I set 
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place 
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I set 
it or NOT set it)

Bill Bell
Sent from mobile



Re: Improving performance for SOLR geo queries?

2012-02-14 Thread Bill Bell
Can we get this back ported to 3x?

Bill Bell
Sent from mobile


On Feb 14, 2012, at 3:45 AM, Matthias Käppler matth...@qype.com wrote:

 hey thanks all for the suggestions, didn't have time to look into them
 yet as we're feature-sprinting for MWC, but will report back with some
 feedback over the next weeks (we will have a few more performance
 sprints in March)
 
 Best,
 Matthias
 
 On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley yo...@lucidimagination.com 
 wrote:
 One way to speed up numeric range queries (at the cost of increased
 index size) is to lower the precisionStep.  You could try changing
 this from 8 to 4 and then re-indexing to see how that affects your
 query speed.
 
 Your issue, and the fact that I had been looking at the post-filtering
 code again for another client, reminded me that I had been planning on
 implementing post-filtering for spatial.  It's now checked into trunk.
 
 If you have the ability to use trunk, you can add a high cost (like
 cost=200) along with cache=false to trigger it.
 
 More details here:
 http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/
 
 -Yonik
 lucidimagination.com
 
 
 
 -- 
 Matthias Käppler
 Lead Developer API  Mobile
 
 Qype GmbH
 Großer Burstah 50-52
 20457 Hamburg
 Telephone: +49 (0)40 - 219 019 2 - 160
 Skype: m_kaeppler
 Email: matth...@qype.com
 
 Managing Director: Ian Brotherston
 Amtsgericht Hamburg
 HRB 95913
 
 This e-mail and its attachments may contain confidential and/or
 privileged information. If you are not the intended recipient (or have
 received this e-mail in error) please notify the sender immediately
 and destroy this e-mail and its attachments. Any unauthorized copying,
 disclosure or distribution of this e-mail and  its attachments is
 strictly forbidden. This notice also applies to future messages.


Re: Highlighting stopwords

2012-02-14 Thread Koji Sekiguchi

(12/02/14 22:25), O. Klein wrote:

I have not been able to find any logic in the behavior of hl.q and how it
analyses the query. Could you explain how it is supposed to work?


Nothing special on hl.q. If you use hl.q, the value of it will be used for
highlighting rather than the value of q. There's no tricks, I think.

 When using hl.q=content_hl:(spell Check) I now get highlighting including
 stopwords.

 but when using hl.q=content_hl:(SC) where SC is synonym I get no
 highlighting.

 Can you verify if synonyms work when using hl.q?

  :

 OK I got it working by using hl.q=content_hl:(spell Check)
 content_text:(spell Check) but it makes no sense to me.

 only difference between the 2 fields is the use of Stopwords.

Uh, what you tried was that you changed the field between q and hl.q,
that I've not expected use case when I proposed hl.q.

Do you think that hl.text meats your needs?

https://issues.apache.org/jira/browse/SOLR-1926?focusedCommentId=12871234page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871234

koji
--
Apache Solr Query Log Visualizer
http://soleami.com/


Re: Stemming and accents (HunspellStemFilterFactory)

2012-02-14 Thread Chantal Ackermann
Hi Bráulio,

I don't know about HunspellStemFilterFactory especially but concerning
accents:

There are several accent filter that will remove accents from your
tokens. If the Hunspell filter factory requires the accents, then simply
add the accent filters after Hunspell in your index and query filter
chains.

You would then have Hunspell produce the tokens as result of the
stemming and only afterwards the accents would be removed (your example:
'forum' instead of 'fórum'). Do the same on the query side in case
someone inputs accents.

Accent filters are:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
(lowercases, as well!)
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory

and others on that page.

Chantal


On Tue, 2012-02-14 at 14:48 +0100, Bráulio Bhavamitra wrote:
 Hello all,
 
 I'm evaluating the HunspellStemFilterFactory I found it works with a
 pt_PT dictionary.
 
 For example, if I search for 'fóruns' it stems it to 'fórum' and then find
 'fórum' references.
 
 But if I search for 'foruns' (without accent),
 then HunspellStemFilterFactory cannot stem
 word, as it does' not exist in its dictionary.
 
 It there any way to make HunspellStemFilterFactory work without accents
 differences?
 
 best,
 bráulio



Re: Highlighting stopwords

2012-02-14 Thread O. Klein

Koji Sekiguchi wrote
 
 Uh, what you tried was that you changed the field between q and hl.q,
 that I've not expected use case when I proposed hl.q.
 
 Do you think that hl.text meats your needs?
 
 https://issues.apache.org/jira/browse/SOLR-1926?focusedCommentId=12871234page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12871234
 
 koji
 -- 
 Apache Solr Query Log Visualizer
 http://soleami.com/
 

Well, If I understand it correctly, yes.

If this means that queries are analyzed like the field they are
highlighting. That would give the highlighter a lot more flexibility. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3744054.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Sorry, have not gotten it yet, but will be back trying later today - monday, 
tuesday tend to be slow for me (meetings and crap).

- Mark

On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...
 
 
 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

- Mark Miller
lucidimagination.com













Re: SolrJ + SolrCloud

2012-02-14 Thread Mark Miller
No hard plans around that that at the moment, but when I free up some time I 
plan on looking at the JIRA issue I pointed to. Looks like a lot of the work 
may already be done.

- mark

On Feb 12, 2012, at 8:14 AM, Darren Govoni wrote:

 Thanks Mark. Is there any plan to make all the Solr search handlers work
 with SolrCloud, like MLT? That missing feature would prohibit us from
 using SolrCloud at the moment. :(
 
 On Sat, 2012-02-11 at 18:24 -0500, Mark Miller wrote:
 On Feb 11, 2012, at 6:02 PM, Darren Govoni wrote:
 
 Hi,
 Do all the normal facilities of Solr work with SolrCloud from SolrJ?
 Things like /mlt, /cluster, facets , tvf's, etc.
 
 Darren
 
 
 
 SolrJ works the same in SolrCloud mode as it does in non SolrCloud mode - 
 it's fully supported. There is even a new SolrJ client called 
 CloudSolrServer that has built in cluster awareness and load balancing.
 
 In terms of what is supported - anything that is supported with distributed 
 search - that is most things, but there is the odd man out - like MLT - 
 looks like an issue is open here: 
 https://issues.apache.org/jira/browse/SOLR-788 but it's not resolved yet.
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Thanks Mark, not a huge rush, just me trying to get to use the latest
stuff on our project.

On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - monday, 
 tuesday tend to be slow for me (meetings and crap).

 - Mark

 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

 - Mark Miller
 lucidimagination.com













Re: OR-FilterQuery

2012-02-14 Thread Mikhail Khludnev
On Mon, Feb 13, 2012 at 11:17 PM, spr...@gmx.eu wrote:

 Hi,

 how efficent is such an query:

 q=some text
 fq=id:(1 OR 2 OR 3...)

 Should I better use q:some text AND id:(1 OR 2 OR 3...)?

1. These two opts have the different scoring.
2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have a benefit due
to reading docset from heap instead of searching on disk.



 Is the Filter Cache used for the OR'ed fq?

Filter cache is used for whatever filter. I guess I didn't get you. Can't
you rephrase your question?



 Thank you




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Need help with graphing function (MATH)

2012-02-14 Thread Mark
Thanks I'll have a look at this. I should have mentioned that the actual 
values on the graph aren't important rather I was showing an example of 
how the function should behave.


On 2/13/12 6:25 PM, Kent Fitch wrote:

Hi, assuming you have x and want to generate y, then maybe

- if x  50, y = 150

- if x  175, y = 60

- otherwise :

either y = (100/(e^((x -50)/75)^2)) + 50
http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175


- or maybe y =sin((x+5)/38)*42+105

http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175

Regards,

Kent Fitch

On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.com 
mailto:static.void@gmail.com wrote:


I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.

Can some of you math guys please help :)

Thanks.






Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
bq:  Is the Filter Cache used for the OR'ed fq?

The filter cache is actually pretty simple conceptually. It's
just a map where the key is the fq and the value is the set
of documents that satisfy that fq (we'll skip the implementation
here, just think of it as the list of all the docs that the fq selects).

Solr doesn't attempt to do much with the key, just think of it
as a single string. Whether or not an fq is reused from the
cache depends upon whether the key is in the map.

So fq=id:(1 OR 2 OR 3) will just look to see if
id:(1 OR 2 OR 3) is a key. If so, it'll just use the
document list stored in the cache.

It won't match
id:(1 OR 2)
or
id:(2)
or
id:1 OR id:2 OR id:3

In other words, there's no attempt to decompose the fq clause
and store parts of it in the cache, it's exact-match or
nothing.

Hope that helps
Erick

On Mon, Feb 13, 2012 at 2:17 PM,  spr...@gmx.eu wrote:
 Hi,

 how efficent is such an query:

 q=some text
 fq=id:(1 OR 2 OR 3...)

 Should I better use q:some text AND id:(1 OR 2 OR 3...)?

 Is the Filter Cache used for the OR'ed fq?

 Thank you



Re: Need help with graphing function (MATH)

2012-02-14 Thread Gora Mohanty
On 14 February 2012 23:35, Mark static.void@gmail.com wrote:
 Thanks I'll have a look at this. I should have mentioned that the actual
 values on the graph aren't important rather I was showing an example of how
 the function should behave.
[...]

 either y = (100/(e^((x -50)/75)^2)) + 50
[...]

In general, the exponential will be better behave than the sinusoid.
You can change the exact values by tweaking the coeffiocients in the
equation.

Regards,
Gora


Re: Re: Solr 3.5 not starting on CentOS 6 or RHEL 5

2012-02-14 Thread Bernhardt, Russell (CIV)
Nope, I don't have a custom /tmp mount in fstab, I just have a basic CentOS 6 
install for development and testing...
 Full everyone read/write permissions are in place on /tmp too.


 Is /tmp a separate file system? There are problems with people
 mounting /tmp with 'noexec' as a security precaution, which then
 causes Solr to fail.


Russ Bernhardt
Systems Office
Dudley Knox Library, Naval Postgraduate School
Monterey, CA


Re: OR-FilterQuery

2012-02-14 Thread Mikhail Khludnev
Hi Em,

I briefly read the thread. Are you talking about combing of cached clauses
of BooleanQuery, instead of evaluating whole BQ as a filter?

I found something like that in API (but only in API)
http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)

Am I get you right? Why do you need it, btw? If I'm ..
I have idea how to do it in two mins:

q=+f:text
+(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)...

Right leg will be a BooleanQuery with SHOULD clauses backed on cached
queries (see below).

if you are not scarred by the syntax yet you can implement trivial
fqQParserPlugin, which will be just

// lazily through User/Generic Cache
q = new FilteredQuery (new MatchAllDocsQuery(), new
CachingWrapperFilter(new
QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
return q;

it will use per segment bitset at contrast to Solr's fq which caches for
top level reader.

WDYT?

On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote:

 Hi,

 have a look at:
 http://search-lucene.com/m/Z8lWGEiKoI

 I think not much had changed since then.

 Regards,
 Em

 Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
  Hi,
 
  how efficent is such an query:
 
  q=some text
  fq=id:(1 OR 2 OR 3...)
 
  Should I better use q:some text AND id:(1 OR 2 OR 3...)?
 
  Is the Filter Cache used for the OR'ed fq?
 
  Thank you
 
 




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Need help with graphing function (MATH)

2012-02-14 Thread Ted Dunning
In general this kind of function is very easy to construct using sums of basic 
sigmoidal functions. The logistic and probit functions are commonly used for 
this. 

Sent from my iPhone

On Feb 14, 2012, at 10:05, Mark static.void@gmail.com wrote:

 Thanks I'll have a look at this. I should have mentioned that the actual 
 values on the graph aren't important rather I was showing an example of how 
 the function should behave.
 
 On 2/13/12 6:25 PM, Kent Fitch wrote:
 Hi, assuming you have x and want to generate y, then maybe
 
 - if x  50, y = 150
 
 - if x  175, y = 60
 
 - otherwise :
 
 either y = (100/(e^((x -50)/75)^2)) + 50
 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175
 
 
 - or maybe y =sin((x+5)/38)*42+105
 
 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175
 
 Regards,
 
 Kent Fitch
 
 On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.com 
 mailto:static.void@gmail.com wrote:
 
I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.
 
Can some of you math guys please help :)
 
Thanks.
 
 
 
 


Re: Need help with graphing function (MATH)

2012-02-14 Thread Mark
Would you mind throwing out an example of these types of functions. 
Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems 
like the Probit function is very similar to what I want.


Thanks

On 2/14/12 10:56 AM, Ted Dunning wrote:

In general this kind of function is very easy to construct using sums of basic 
sigmoidal functions. The logistic and probit functions are commonly used for 
this.

Sent from my iPhone

On Feb 14, 2012, at 10:05, Markstatic.void@gmail.com  wrote:


Thanks I'll have a look at this. I should have mentioned that the actual values 
on the graph aren't important rather I was showing an example of how the 
function should behave.

On 2/13/12 6:25 PM, Kent Fitch wrote:

Hi, assuming you have x and want to generate y, then maybe

- if x  50, y = 150

- if x  175, y = 60

- otherwise :

either y = (100/(e^((x -50)/75)^2)) + 50
http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175


- or maybe y =sin((x+5)/38)*42+105

http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175

Regards,

Kent Fitch

On Tue, Feb 14, 2012 at 12:29 PM, 
Markstatic.void@gmail.commailto:static.void@gmail.com  wrote:

I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.

Can some of you math guys please help :)

Thanks.






Re: Need help with graphing function (MATH)

2012-02-14 Thread Mark

Or better yet an example in solr would be best :)

Thanks!

On 2/14/12 11:05 AM, Mark wrote:
Would you mind throwing out an example of these types of functions. 
Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems 
like the Probit function is very similar to what I want.


Thanks

On 2/14/12 10:56 AM, Ted Dunning wrote:
In general this kind of function is very easy to construct using sums 
of basic sigmoidal functions. The logistic and probit functions are 
commonly used for this.


Sent from my iPhone

On Feb 14, 2012, at 10:05, Markstatic.void@gmail.com  wrote:

Thanks I'll have a look at this. I should have mentioned that the 
actual values on the graph aren't important rather I was showing an 
example of how the function should behave.


On 2/13/12 6:25 PM, Kent Fitch wrote:

Hi, assuming you have x and want to generate y, then maybe

- if x  50, y = 150

- if x  175, y = 60

- otherwise :

either y = (100/(e^((x -50)/75)^2)) + 50
http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175 




- or maybe y =sin((x+5)/38)*42+105

http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175 



Regards,

Kent Fitch

On Tue, Feb 14, 2012 at 12:29 PM, 
Markstatic.void@gmail.commailto:static.void@gmail.com  
wrote:


I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.

Can some of you math guys please help :)

Thanks.






Re: OR-FilterQuery

2012-02-14 Thread Em
Hi Mikhail,

thanks for kicking in some brainstorming-code!
The given thread is almost a year old and I was working with Solr in my
freetime to see where it fails to behave/perform as I expect/wish.

I found out that if you got a lot of different access-patterns for a
filter-query, you might end up with either a big cache to make things
fast or with lower performance (impact depends on usecase and
circumstances).

Scenario:
You got a permission-field and the client is able to filter by one to
three permission-values.
That is:
fq=foo:user
fq=foo:moderator
fq=foo:manager

If you can not control/guarantee the order of the fq's values, you could
end up with a lot of mess which all returns the same.

Example:
fq=permission:user OR permission:moderator OR permission:manager
fq=permission:user OR permission:manager OR permission:moderator
fq=permission:moderator OR permission:user OR permission:manager
...
They all return the same but where cached seperately which leads to the
fact that you are wasting memory a lot.

Furthermore, if your access pattern will lead to a lot of different fq's
on a small set of distinct values, it may make more sense to cache each
filter-query for itself from a memory-consuming point of view (may cost
a little bit performance).

That beeing said, if you cache a filter for foo:user, foo:moderator and
foo:manager you can combine those filters with AND, OR, NOT or whatever
without recomputing every filter over and over again which would be the
case if your filter-cache is not large enough.

However, I never compared the performance differences (in terms of
speed) of a cached filter-query like
foo:bar OR foo:baz
With a combination of two cached filter-queries like
foo:bar
foo:baz
combined by a logical OR.

That's how the background looks like.
Unfortunately I didn't had the time to implement this in the past.

Back to your post:
Looks like a cool idea and is almost what I had in mind!

I would formulate an easier syntax so that one is able to parse each
fq-clause on its own to cache the CachingWrapperFilter to reuse it again.

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.
Could you explain why this bitset would be per-segment based, please?
I don't see a reason why this *have* to be so.
What is the benefit you are seeing?

Kind regards,
Em

Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
 Hi Em,
 
 I briefly read the thread. Are you talking about combing of cached clauses
 of BooleanQuery, instead of evaluating whole BQ as a filter?
 
 I found something like that in API (but only in API)
 http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
 
 Am I get you right? Why do you need it, btw? If I'm ..
 I have idea how to do it in two mins:
 
 q=+f:text
 +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)...
 
 Right leg will be a BooleanQuery with SHOULD clauses backed on cached
 queries (see below).
 
 if you are not scarred by the syntax yet you can implement trivial
 fqQParserPlugin, which will be just
 
 // lazily through User/Generic Cache
 q = new FilteredQuery (new MatchAllDocsQuery(), new
 CachingWrapperFilter(new
 QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
 return q;
 
 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.
 
 WDYT?
 
 On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote:
 
 Hi,

 have a look at:
 http://search-lucene.com/m/Z8lWGEiKoI

 I think not much had changed since then.

 Regards,
 Em

 Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
 Hi,

 how efficent is such an query:

 q=some text
 fq=id:(1 OR 2 OR 3...)

 Should I better use q:some text AND id:(1 OR 2 OR 3...)?

 Is the Filter Cache used for the OR'ed fq?

 Thank you



 
 
 


Re: Need help with graphing function (MATH)

2012-02-14 Thread Em
Hi Mark,

did you already had a look at http://wiki.apache.org/solr/FunctionQuery ?

Regards,
Em

Am 14.02.2012 20:09, schrieb Mark:
 Or better yet an example in solr would be best :)
 
 Thanks!
 
 On 2/14/12 11:05 AM, Mark wrote:
 Would you mind throwing out an example of these types of functions.
 Looking at Wikipedia (http://en.wikipedia.org/wiki/Probit) its seems
 like the Probit function is very similar to what I want.

 Thanks

 On 2/14/12 10:56 AM, Ted Dunning wrote:
 In general this kind of function is very easy to construct using sums
 of basic sigmoidal functions. The logistic and probit functions are
 commonly used for this.

 Sent from my iPhone

 On Feb 14, 2012, at 10:05, Markstatic.void@gmail.com  wrote:

 Thanks I'll have a look at this. I should have mentioned that the
 actual values on the graph aren't important rather I was showing an
 example of how the function should behave.

 On 2/13/12 6:25 PM, Kent Fitch wrote:
 Hi, assuming you have x and want to generate y, then maybe

 - if x  50, y = 150

 - if x  175, y = 60

 - otherwise :

 either y = (100/(e^((x -50)/75)^2)) + 50
 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175



 - or maybe y =sin((x+5)/38)*42+105

 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175


 Regards,

 Kent Fitch

 On Tue, Feb 14, 2012 at 12:29 PM,
 Markstatic.void@gmail.commailto:static.void@gmail.com 
 wrote:

 I need some help with one of my boost functions. I would like the
 function to look something like the following mockup below. Starts
 off flat then there is a gradual decline, steep decline then
 gradual decline and then back to flat.

 Can some of you math guys please help :)

 Thanks.




 


Re: Need help with graphing function (MATH)

2012-02-14 Thread Walter Underwood
In practice, I expect a linear piecewise function (with sharp corners) would be 
indistinguishable from the smoothed function. It is also much easier to read, 
test, and debug. It might even be faster.

Try the sharp corners one first.

wunder

On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote:

 In general this kind of function is very easy to construct using sums of 
 basic sigmoidal functions. The logistic and probit functions are commonly 
 used for this. 
 
 Sent from my iPhone
 
 On Feb 14, 2012, at 10:05, Mark static.void@gmail.com wrote:
 
 Thanks I'll have a look at this. I should have mentioned that the actual 
 values on the graph aren't important rather I was showing an example of how 
 the function should behave.
 
 On 2/13/12 6:25 PM, Kent Fitch wrote:
 Hi, assuming you have x and want to generate y, then maybe
 
 - if x  50, y = 150
 
 - if x  175, y = 60
 
 - otherwise :
 
 either y = (100/(e^((x -50)/75)^2)) + 50
 http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175
 
 
 - or maybe y =sin((x+5)/38)*42+105
 
 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175
 
 Regards,
 
 Kent Fitch
 
 On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.com 
 mailto:static.void@gmail.com wrote:
 
   I need some help with one of my boost functions. I would like the
   function to look something like the following mockup below. Starts
   off flat then there is a gradual decline, steep decline then
   gradual decline and then back to flat.
 
   Can some of you math guys please help :)
 
   Thanks.
 






Re: Solr 3.5 not starting on CentOS 6 or RHEL 5

2012-02-14 Thread Yonik Seeley
Perhaps this is some kind of vufind specific issue?
The server (/example) bundled with solr unpacks the war in
/example/work and not /tmp

-Yonik
lucidimagination.com

On Mon, Feb 13, 2012 at 7:06 PM, Bernhardt, Russell (CIV)
rgber...@nps.edu wrote:
 A software package we use recently upgraded to Solr 3.5 (from 1.4.1) and now 
 we're having problems getting the Solr server to start up under RHEL 5 or 
 CentOS 6.

 I upgraded our local install of Java to the latest from Oracle and it didn't 
 help, even removed the local OpenJDK just to be sure.

 When starting jetty manually (with java -jar start.jar) I get the following 
 messages:

 2012-02-13 07:52:55.954::INFO:  Logging to STDERR via 
 org.mortbay.log.StdErrLog
 2012-02-13 07:52:56.120::INFO:  jetty-6.1.11
 2012-02-13 07:52:56.184::INFO:  Extract 
 jar:file:/opt/vufind/solr/jetty/webapps/solr.war!/ to 
 /tmp/Jetty_0_0_0_0_8080_solr.war__solr__7k9npr/webapp
 2012-02-13 07:52:56.702::WARN:  Failed startup of context 
 org.mortbay.jetty.webapp.WebAppContext@15aaf0b3{/solr,jar:file:/opt/vufind/solr/jetty/webapps/solr.war!/}
 java.util.zip.ZipException: error in opening zip file
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.init(Unknown Source)
        at java.util.jar.JarFile.init(Unknown Source)
        at java.util.jar.JarFile.init(Unknown Source)
        at 
 org.mortbay.jetty.webapp.TagLibConfiguration.configureWebApp(TagLibConfiguration.java:168)
        at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1217)
        at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:513)
        at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
        at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
        at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
        at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
        at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
        at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
        at org.mortbay.jetty.Server.doStart(Server.java:222)
        at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:39)
        at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:977)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.mortbay.start.Main.invokeMain(Main.java:194)
        at org.mortbay.start.Main.start(Main.java:512)
        at org.mortbay.start.Main.main(Main.java:119)
 2012-02-13 07:52:56.713::INFO:  Opened 
 /opt/vufind/solr/jetty/logs/2012_02_13.request.log
 2012-02-13 07:52:56.740::INFO:  Started SelectChannelConnector@0.0.0.0:8080

 Jetty starts up just fine but shows a 503 error when attempting to access 
 localhost:8080/solr/. The temp directory structure does exist in /tmp/. Any 
 ideas?

 Thanks,

 Russ Bernhardt
 Systems Analyst
 Library Information Systems
 Naval Postgraduate School, Monterey CA



Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Okay Jamie, I think I have a handle on this. It looks like an issue with what 
config files are being used by cores created with the admin core handler - I 
think it's just picking up default config and not the correct config for the 
collection. This means they end up using config that has no UpdateLog defined - 
and so recovery fails.

I've added more logging around this so that it's easy to determine that.

I'm investigating more and working on a test + fix. I'll file a JIRA issue soon 
as well.

- Mark

On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.
 
 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - monday, 
 tuesday tend to be slow for me (meetings and crap).
 
 - Mark
 
 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
 
 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...
 
 
 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Sounds good, if I pull the latest from trunk and rerun will that be
useful or were you able to duplicate my issue now?

On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote:
 Okay Jamie, I think I have a handle on this. It looks like an issue with what 
 config files are being used by cores created with the admin core handler - I 
 think it's just picking up default config and not the correct config for the 
 collection. This means they end up using config that has no UpdateLog defined 
 - and so recovery fails.

 I've added more logging around this so that it's easy to determine that.

 I'm investigating more and working on a test + fix. I'll file a JIRA issue 
 soon as well.

 - Mark

 On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.

 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).

 - Mark

 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr 
 shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com













Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
Whoa!

fq=id(1 OR 2)
is not the same thing at all as
fq=id:1fq=id:2

Assuming that any document had one and only one ID,  the second clause
would return exactly 0 documents, each and every time.

Multiple fq clauses are essentially set intersections. So the first query is the
set of all documents where id is 1 or 2
the second is the intersection of two sets of documents, one set
with an id of 1 and one with an id of 2. Not the same thing at all.

There's no support for the concept of
(fq=id:1 OR fq=id:2)

Best
Erick

On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote:
 Hi Mikhail,

 thanks for kicking in some brainstorming-code!
 The given thread is almost a year old and I was working with Solr in my
 freetime to see where it fails to behave/perform as I expect/wish.

 I found out that if you got a lot of different access-patterns for a
 filter-query, you might end up with either a big cache to make things
 fast or with lower performance (impact depends on usecase and
 circumstances).

 Scenario:
 You got a permission-field and the client is able to filter by one to
 three permission-values.
 That is:
 fq=foo:user
 fq=foo:moderator
 fq=foo:manager

 If you can not control/guarantee the order of the fq's values, you could
 end up with a lot of mess which all returns the same.

 Example:
 fq=permission:user OR permission:moderator OR permission:manager
 fq=permission:user OR permission:manager OR permission:moderator
 fq=permission:moderator OR permission:user OR permission:manager
 ...
 They all return the same but where cached seperately which leads to the
 fact that you are wasting memory a lot.

 Furthermore, if your access pattern will lead to a lot of different fq's
 on a small set of distinct values, it may make more sense to cache each
 filter-query for itself from a memory-consuming point of view (may cost
 a little bit performance).

 That beeing said, if you cache a filter for foo:user, foo:moderator and
 foo:manager you can combine those filters with AND, OR, NOT or whatever
 without recomputing every filter over and over again which would be the
 case if your filter-cache is not large enough.

 However, I never compared the performance differences (in terms of
 speed) of a cached filter-query like
 foo:bar OR foo:baz
 With a combination of two cached filter-queries like
 foo:bar
 foo:baz
 combined by a logical OR.

 That's how the background looks like.
 Unfortunately I didn't had the time to implement this in the past.

 Back to your post:
 Looks like a cool idea and is almost what I had in mind!

 I would formulate an easier syntax so that one is able to parse each
 fq-clause on its own to cache the CachingWrapperFilter to reuse it again.

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.
 Could you explain why this bitset would be per-segment based, please?
 I don't see a reason why this *have* to be so.
 What is the benefit you are seeing?

 Kind regards,
 Em

 Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
 Hi Em,

 I briefly read the thread. Are you talking about combing of cached clauses
 of BooleanQuery, instead of evaluating whole BQ as a filter?

 I found something like that in API (but only in API)
 http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)

 Am I get you right? Why do you need it, btw? If I'm ..
 I have idea how to do it in two mins:

 q=+f:text
 +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 _query_:{!fq}id:4)...

 Right leg will be a BooleanQuery with SHOULD clauses backed on cached
 queries (see below).

 if you are not scarred by the syntax yet you can implement trivial
 fqQParserPlugin, which will be just

 // lazily through User/Generic Cache
 q = new FilteredQuery (new MatchAllDocsQuery(), new
 CachingWrapperFilter(new
 QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
 return q;

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.

 WDYT?

 On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote:

 Hi,

 have a look at:
 http://search-lucene.com/m/Z8lWGEiKoI

 I think not much had changed since then.

 Regards,
 Em

 Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
 Hi,

 how efficent is such an query:

 q=some text
 fq=id:(1 OR 2 OR 3...)

 Should I better use q:some text AND id:(1 OR 2 OR 3...)?

 Is the Filter Cache used for the OR'ed fq?

 Thank you








Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
BTW, you're not the first person who would like this capability, see:
https://issues.apache.org/jira/browse/SOLR-1223

But the fact that this JIRA was originally opened in in June of 2009
and hasn't been implemented yet indicates that it's not  super-high
priority.

Best
Erick

On Tue, Feb 14, 2012 at 4:33 PM, Erick Erickson erickerick...@gmail.com wrote:
 Whoa!

 fq=id(1 OR 2)
 is not the same thing at all as
 fq=id:1fq=id:2

 Assuming that any document had one and only one ID,  the second clause
 would return exactly 0 documents, each and every time.

 Multiple fq clauses are essentially set intersections. So the first query is 
 the
 set of all documents where id is 1 or 2
 the second is the intersection of two sets of documents, one set
 with an id of 1 and one with an id of 2. Not the same thing at all.

 There's no support for the concept of
 (fq=id:1 OR fq=id:2)

 Best
 Erick

 On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote:
 Hi Mikhail,

 thanks for kicking in some brainstorming-code!
 The given thread is almost a year old and I was working with Solr in my
 freetime to see where it fails to behave/perform as I expect/wish.

 I found out that if you got a lot of different access-patterns for a
 filter-query, you might end up with either a big cache to make things
 fast or with lower performance (impact depends on usecase and
 circumstances).

 Scenario:
 You got a permission-field and the client is able to filter by one to
 three permission-values.
 That is:
 fq=foo:user
 fq=foo:moderator
 fq=foo:manager

 If you can not control/guarantee the order of the fq's values, you could
 end up with a lot of mess which all returns the same.

 Example:
 fq=permission:user OR permission:moderator OR permission:manager
 fq=permission:user OR permission:manager OR permission:moderator
 fq=permission:moderator OR permission:user OR permission:manager
 ...
 They all return the same but where cached seperately which leads to the
 fact that you are wasting memory a lot.

 Furthermore, if your access pattern will lead to a lot of different fq's
 on a small set of distinct values, it may make more sense to cache each
 filter-query for itself from a memory-consuming point of view (may cost
 a little bit performance).

 That beeing said, if you cache a filter for foo:user, foo:moderator and
 foo:manager you can combine those filters with AND, OR, NOT or whatever
 without recomputing every filter over and over again which would be the
 case if your filter-cache is not large enough.

 However, I never compared the performance differences (in terms of
 speed) of a cached filter-query like
 foo:bar OR foo:baz
 With a combination of two cached filter-queries like
 foo:bar
 foo:baz
 combined by a logical OR.

 That's how the background looks like.
 Unfortunately I didn't had the time to implement this in the past.

 Back to your post:
 Looks like a cool idea and is almost what I had in mind!

 I would formulate an easier syntax so that one is able to parse each
 fq-clause on its own to cache the CachingWrapperFilter to reuse it again.

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.
 Could you explain why this bitset would be per-segment based, please?
 I don't see a reason why this *have* to be so.
 What is the benefit you are seeing?

 Kind regards,
 Em

 Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
 Hi Em,

 I briefly read the thread. Are you talking about combing of cached clauses
 of BooleanQuery, instead of evaluating whole BQ as a filter?

 I found something like that in API (but only in API)
 http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)

 Am I get you right? Why do you need it, btw? If I'm ..
 I have idea how to do it in two mins:

 q=+f:text
 +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 
 _query_:{!fq}id:4)...

 Right leg will be a BooleanQuery with SHOULD clauses backed on cached
 queries (see below).

 if you are not scarred by the syntax yet you can implement trivial
 fqQParserPlugin, which will be just

 // lazily through User/Generic Cache
 q = new FilteredQuery (new MatchAllDocsQuery(), new
 CachingWrapperFilter(new
 QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
 return q;

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.

 WDYT?

 On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de wrote:

 Hi,

 have a look at:
 http://search-lucene.com/m/Z8lWGEiKoI

 I think not much had changed since then.

 Regards,
 Em

 Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
 Hi,

 how efficent is such an query:

 q=some text
 fq=id:(1 OR 2 OR 3...)

 Should I better use q:some text AND id:(1 OR 2 OR 3...)?

 Is the Filter Cache used for the OR'ed fq?

 Thank you








Re: OR-FilterQuery

2012-02-14 Thread Em
Hi Erick,

 Whoa!

 fq=id(1 OR 2)
 is not the same thing at all as
 fq=id:1fq=id:2
Ahm, who said they would be the same? :)
I mean, you are completely right in what you are saying but it seems to
me that we are talking about two different things.

I was talking about caching each filter-criteria instead of the whole
filter-query to recombine the cached filter-criteria based on the
boolean-operators the client sends.

In other words:
currently
fq=id:1 OR id:2
results into ONE cached filter-entry.

fq=id:2 OR id:1
results into ANOTHER cached filter-entry

fq=id:2 AND id:1
results into (surprise, surprise) a third filter-entry (although this
example does not make sense).

My idea was to cache each filter-criteria, that means caching the bitset
for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR,
NOT etc. whenever this is neccessary.

This way one could save memory (and maybe computing-time as well) which
definitely makes sense when you got a way smaller set of
filter-criterias while having a much larger set of possible (and used)
combinations of each filter-criteria with a small number of repetitions
per combination (which would destroy the benefit of caching).

Don't you agree?

Kind regards,
Em


Am 14.02.2012 22:33, schrieb Erick Erickson:
 Whoa!
 
 fq=id(1 OR 2)
 is not the same thing at all as
 fq=id:1fq=id:2
 
 Assuming that any document had one and only one ID,  the second clause
 would return exactly 0 documents, each and every time.
 
 Multiple fq clauses are essentially set intersections. So the first query is 
 the
 set of all documents where id is 1 or 2
 the second is the intersection of two sets of documents, one set
 with an id of 1 and one with an id of 2. Not the same thing at all.
 
 There's no support for the concept of
 (fq=id:1 OR fq=id:2)
 
 Best
 Erick
 
 On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote:
 Hi Mikhail,

 thanks for kicking in some brainstorming-code!
 The given thread is almost a year old and I was working with Solr in my
 freetime to see where it fails to behave/perform as I expect/wish.

 I found out that if you got a lot of different access-patterns for a
 filter-query, you might end up with either a big cache to make things
 fast or with lower performance (impact depends on usecase and
 circumstances).

 Scenario:
 You got a permission-field and the client is able to filter by one to
 three permission-values.
 That is:
 fq=foo:user
 fq=foo:moderator
 fq=foo:manager

 If you can not control/guarantee the order of the fq's values, you could
 end up with a lot of mess which all returns the same.

 Example:
 fq=permission:user OR permission:moderator OR permission:manager
 fq=permission:user OR permission:manager OR permission:moderator
 fq=permission:moderator OR permission:user OR permission:manager
 ...
 They all return the same but where cached seperately which leads to the
 fact that you are wasting memory a lot.

 Furthermore, if your access pattern will lead to a lot of different fq's
 on a small set of distinct values, it may make more sense to cache each
 filter-query for itself from a memory-consuming point of view (may cost
 a little bit performance).

 That beeing said, if you cache a filter for foo:user, foo:moderator and
 foo:manager you can combine those filters with AND, OR, NOT or whatever
 without recomputing every filter over and over again which would be the
 case if your filter-cache is not large enough.

 However, I never compared the performance differences (in terms of
 speed) of a cached filter-query like
 foo:bar OR foo:baz
 With a combination of two cached filter-queries like
 foo:bar
 foo:baz
 combined by a logical OR.

 That's how the background looks like.
 Unfortunately I didn't had the time to implement this in the past.

 Back to your post:
 Looks like a cool idea and is almost what I had in mind!

 I would formulate an easier syntax so that one is able to parse each
 fq-clause on its own to cache the CachingWrapperFilter to reuse it again.

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.
 Could you explain why this bitset would be per-segment based, please?
 I don't see a reason why this *have* to be so.
 What is the benefit you are seeing?

 Kind regards,
 Em

 Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
 Hi Em,

 I briefly read the thread. Are you talking about combing of cached clauses
 of BooleanQuery, instead of evaluating whole BQ as a filter?

 I found something like that in API (but only in API)
 http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)

 Am I get you right? Why do you need it, btw? If I'm ..
 I have idea how to do it in two mins:

 q=+f:text
 +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3 
 _query_:{!fq}id:4)...

 Right leg will be a BooleanQuery with SHOULD clauses backed on cached
 queries (see below).

 if you are not scarred by the syntax yet you can implement trivial
 

Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Doh - looks like I was just seeing a test issue. Do you mind updating and 
trying the latest rev? At the least there should be some better logging around 
the recovery.

I'll keep working on tests in the meantime.

- Mark

On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:

 Sounds good, if I pull the latest from trunk and rerun will that be
 useful or were you able to duplicate my issue now?
 
 On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote:
 Okay Jamie, I think I have a handle on this. It looks like an issue with 
 what config files are being used by cores created with the admin core 
 handler - I think it's just picking up default config and not the correct 
 config for the collection. This means they end up using config that has no 
 UpdateLog defined - and so recovery fails.
 
 I've added more logging around this so that it's easy to determine that.
 
 I'm investigating more and working on a test + fix. I'll file a JIRA issue 
 soon as well.
 
 - Mark
 
 On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
 
 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.
 
 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).
 
 - Mark
 
 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
 
 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...
 
 
 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr 
 shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com













Re: Need help with graphing function (MATH)

2012-02-14 Thread Kent Fitch
agreeing with wunder - I don't know the application, but I think almost
always, a set of linear approximations over a few ranges would be ok (and
you could increase the number of ranges until it was), and will be faster.

And if you need just one equation, a sigmoid function will do the trick,
such as

110 - 50((x-100)/20)/(sqrt(1+((x-100)/20)^2))

http://www.wolframalpha.com/input/?i=plot+110+-+50%28%28x-100%29%2F20%29%2F%28sqrt%281%2B%28%28x-100%29%2F20%29
^2%29%29%2C+x%3D0..200

Regards

Kent Fitch

On Wed, Feb 15, 2012 at 6:17 AM, Walter Underwood wun...@wunderwood.orgwrote:

 In practice, I expect a linear piecewise function (with sharp corners)
 would be indistinguishable from the smoothed function. It is also much
 easier to read, test, and debug. It might even be faster.

 Try the sharp corners one first.

 wunder

 On Feb 14, 2012, at 10:56 AM, Ted Dunning wrote:

  In general this kind of function is very easy to construct using sums of
 basic sigmoidal functions. The logistic and probit functions are commonly
 used for this.
 
  Sent from my iPhone
 
  On Feb 14, 2012, at 10:05, Mark static.void@gmail.com wrote:
 
  Thanks I'll have a look at this. I should have mentioned that the
 actual values on the graph aren't important rather I was showing an example
 of how the function should behave.
 
  On 2/13/12 6:25 PM, Kent Fitch wrote:
  Hi, assuming you have x and want to generate y, then maybe
 
  - if x  50, y = 150
 
  - if x  175, y = 60
 
  - otherwise :
 
  either y = (100/(e^((x -50)/75)^2)) + 50
  http://www.wolframalpha.com/input/?i=plot++%28100%2F%28e
 ^%28%28x+-50%29%2F75%29^2%29%29+%2B+50%2C+x%3D50..175
 
 
  - or maybe y =sin((x+5)/38)*42+105
 
 
 http://www.wolframalpha.com/input/?i=plot++sin%28%28x%2B5%29%2F38%29*42%2B105%2C+x%3D50..175
 
  Regards,
 
  Kent Fitch
 
  On Tue, Feb 14, 2012 at 12:29 PM, Mark static.void@gmail.commailto:
 static.void@gmail.com wrote:
 
I need some help with one of my boost functions. I would like the
function to look something like the following mockup below. Starts
off flat then there is a gradual decline, steep decline then
gradual decline and then back to flat.
 
Can some of you math guys please help :)
 
Thanks.
 







Re: Semantic autocomplete with Solr

2012-02-14 Thread Paul Libbrecht
facetting?

paul


Le 14 févr. 2012 à 23:10, Octavian Covalschi a écrit :

 Hey guys,
 
 Has anyone done any kind of smart autocomplete? Let's say we have a web
 store, and we'd like to autocomplete user's searches. So if I'll type in
 jacket next word that will be suggested should be something related to
 jacket (color, fabric) etc...
 
 It seems to me I have to structure this data in a particular way, but that
 way I can do without solr, so I was wondering if Solr could help us.
 
 Thank you in advance.



Re: Can I rebuild an index and remove some fields?

2012-02-14 Thread Robert Stewart
I was thinking if I make a wrapper class that aggregates another IndexReader 
and filter out terms I don't want anymore it might work.   And then pass that 
wrapper into SegmentMerger.  I think if I filter out terms on 
GetFieldNames(...) and Terms(...) it might work.

Something like:

HashSetstring ignoredTerms=...;

FilteringIndexReader wrapper=new FilterIndexReader(reader);

SegmentMerger merger=new SegmentMerger(writer);

merger.add(wrapper);

merger.Merge();





On Feb 14, 2012, at 1:49 AM, Li Li wrote:

 for method 2, delete is wrong. we can't delete terms.
   you also should hack with the tii and tis file.
 
 On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote:
 
 method1, dumping data
 for stored fields, you can traverse the whole index and save it to
 somewhere else.
 for indexed but not stored fields, it may be more difficult.
if the indexed and not stored field is not analyzed(fields such as
 id), it's easy to get from FieldCache.StringIndex.
But for analyzed fields, though theoretically it can be restored from
 term vector and term position, it's hard to recover from index.
 
 method 2, hack with metadata
 1. indexed fields
  delete by query, e.g. field:*
 2. stored fields
   because all fields are stored sequentially. it's not easy to delete
 some fields. this will not affect search speed. but if you want to get
 stored fields,  and the useless fields are very long, then it will slow
 down.
   also it's possible to hack with it. but need more effort to
 understand the index file format  and traverse the fdt/fdx file.
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
 
 this will give you some insight.
 
 
 On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.comwrote:
 
 Lets say I have a large index (100M docs, 1TB, split up between 10
 indexes).  And a bunch of the stored and indexed fields are not used in
 search at all.  In order to save memory and disk, I'd like to rebuild that
 index *without* those fields, but I don't have original documents to
 rebuild entire index with (don't have the full-text anymore, etc.).  Is
 there some way to rebuild or optimize an existing index with only a sub-set
 of the existing indexed fields?  Or alternatively is there a way to avoid
 loading some indexed fields at all ( to avoid loading term infos and terms
 index ) ?
 
 Thanks
 Bob
 
 
 



Re: Semantic autocomplete with Solr

2012-02-14 Thread Octavian Covalschi
Hm... I used it for some basic group by feature, but haven't thought of it
for autocomplete. I'll give it a shot.

Thanks!


On Tue, Feb 14, 2012 at 4:19 PM, Paul Libbrecht p...@hoplahup.net wrote:

 facetting?

 paul


 Le 14 févr. 2012 à 23:10, Octavian Covalschi a écrit :

  Hey guys,
 
  Has anyone done any kind of smart autocomplete? Let's say we have a web
  store, and we'd like to autocomplete user's searches. So if I'll type in
  jacket next word that will be suggested should be something related to
  jacket (color, fabric) etc...
 
  It seems to me I have to structure this data in a particular way, but
 that
  way I can do without solr, so I was wondering if Solr could help us.
 
  Thank you in advance.




Re: Semantic autocomplete with Solr

2012-02-14 Thread Roman Chyla
done something along these lines:

https://svnweb.cern.ch/trac/rcarepo/wiki/InspireAutoSuggest#Autosuggestautocompletefunctionality

but you would need MontySolr for that - https://github.com/romanchyla/montysolr

roman

On Tue, Feb 14, 2012 at 11:10 PM, Octavian Covalschi
octavian.covals...@gmail.com wrote:
 Hey guys,

 Has anyone done any kind of smart autocomplete? Let's say we have a web
 store, and we'd like to autocomplete user's searches. So if I'll type in
 jacket next word that will be suggested should be something related to
 jacket (color, fabric) etc...

 It seems to me I have to structure this data in a particular way, but that
 way I can do without solr, so I was wondering if Solr could help us.

 Thank you in advance.


payload and exact match

2012-02-14 Thread leonardo2
Is there the possibility of perform 'exact search' in a payload field?

I'have to index text with auxiliary info for each word. In particular at
each word is associated the bounding box containing it in the original pdf
page (it is used for highligthing the search terms in the pdf). I used the
payload to store that information.

In the schema.xml, the fieldType definition is:

---
fieldtype name=wppayloads stored=false indexed=true
class=solr.TextField 
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1
 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.DelimitedPayloadTokenFilterFactory encoder=identity/
/analyzer
/fieldtype
---

while the field definition is:

---
field name=words type=wppayloads indexed=true stored=true
required=true multiValued=true/
---

When indexing, the field 'words' contains a list of word|box as in the
following example:

---
doc_id=example
words={Fonte:|307.62,948.16,324.62,954.25 Comune|326.29,948.16,349.07,954.25
di|350.74,948.16,355.62,954.25 Bologna|358.95,948.16,381.28,954.25}
---

Such solution works well except in the case of an exact search. For example,
assuming the only indexed doc is the 'example' doc (before shown), the query
words:Comune di Bologna returns no results.

Someone know if there is the possibility of perform 'exact search' in a
payload field?

Thanks in advance,
Leonardo


--
View this message in context: 
http://lucene.472066.n3.nabble.com/payload-and-exact-match-tp3745369p3745369.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr soft commit feature

2012-02-14 Thread Dipti Srivastava
Hi All,
Is there a way to soft commit in the current released version of solr 3.5?

Regards,
Dipti Srivastava


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.




Re: OR-FilterQuery

2012-02-14 Thread Erick Erickson
Ah, OK, I misread your post apparently. And yes, what you suggest
would result in some efficiencies, but at present I don't think there's any
syntax that allows one to combine filter queries as you suggest. There
was some discussion about it in the JIRA I referenced, but no action that
I could see.

That is, efficiencies in some circumstances, though I think it would be
hard to predict. For instance, imagine a set of 100 entries in an FQ. And
no, I'm not making things up, I've seen applications where this makes
sense. Splitting that out into 100 separate entries in the filterCache would
use up a lot of space. Likewise, I suspect that the actual process of
creating the heuristics that were able to analyze an incoming filter
query and do the right thing in terms of splitting it up and recombining
it would be pretty hairy. Local parameters for instance, and let's throw in
dereferencing too G...

So I suspect that this is one of those features that is quite easy to see
the benefits of in the simple case, but pretty quickly becomes a
nightmare to actually implement correctly, but that's mostly
a guess.

And before putting the work into it, I think modeling the actual
benefits would be wise, as well as convincing myself that there
are enough cases where this *would* be beneficial. I mean Solr
does a pretty reasonable job of caching these anyway, and with the
non-cached filters it's not clear to me that the benefits are
sufficient...

Good luck, though, if you want to tackle it!
Erick



On Tue, Feb 14, 2012 at 4:54 PM, Em mailformailingli...@yahoo.de wrote:
 Hi Erick,

 Whoa!

 fq=id(1 OR 2)
 is not the same thing at all as
 fq=id:1fq=id:2
 Ahm, who said they would be the same? :)
 I mean, you are completely right in what you are saying but it seems to
 me that we are talking about two different things.

 I was talking about caching each filter-criteria instead of the whole
 filter-query to recombine the cached filter-criteria based on the
 boolean-operators the client sends.

 In other words:
 currently
 fq=id:1 OR id:2
 results into ONE cached filter-entry.

 fq=id:2 OR id:1
 results into ANOTHER cached filter-entry

 fq=id:2 AND id:1
 results into (surprise, surprise) a third filter-entry (although this
 example does not make sense).

 My idea was to cache each filter-criteria, that means caching the bitset
 for id:1 and the bitset for id:2 to recombine both bitsets via AND, OR,
 NOT etc. whenever this is neccessary.

 This way one could save memory (and maybe computing-time as well) which
 definitely makes sense when you got a way smaller set of
 filter-criterias while having a much larger set of possible (and used)
 combinations of each filter-criteria with a small number of repetitions
 per combination (which would destroy the benefit of caching).

 Don't you agree?

 Kind regards,
 Em


 Am 14.02.2012 22:33, schrieb Erick Erickson:
 Whoa!

 fq=id(1 OR 2)
 is not the same thing at all as
 fq=id:1fq=id:2

 Assuming that any document had one and only one ID,  the second clause
 would return exactly 0 documents, each and every time.

 Multiple fq clauses are essentially set intersections. So the first query is 
 the
 set of all documents where id is 1 or 2
 the second is the intersection of two sets of documents, one set
 with an id of 1 and one with an id of 2. Not the same thing at all.

 There's no support for the concept of
 (fq=id:1 OR fq=id:2)

 Best
 Erick

 On Tue, Feb 14, 2012 at 2:13 PM, Em mailformailingli...@yahoo.de wrote:
 Hi Mikhail,

 thanks for kicking in some brainstorming-code!
 The given thread is almost a year old and I was working with Solr in my
 freetime to see where it fails to behave/perform as I expect/wish.

 I found out that if you got a lot of different access-patterns for a
 filter-query, you might end up with either a big cache to make things
 fast or with lower performance (impact depends on usecase and
 circumstances).

 Scenario:
 You got a permission-field and the client is able to filter by one to
 three permission-values.
 That is:
 fq=foo:user
 fq=foo:moderator
 fq=foo:manager

 If you can not control/guarantee the order of the fq's values, you could
 end up with a lot of mess which all returns the same.

 Example:
 fq=permission:user OR permission:moderator OR permission:manager
 fq=permission:user OR permission:manager OR permission:moderator
 fq=permission:moderator OR permission:user OR permission:manager
 ...
 They all return the same but where cached seperately which leads to the
 fact that you are wasting memory a lot.

 Furthermore, if your access pattern will lead to a lot of different fq's
 on a small set of distinct values, it may make more sense to cache each
 filter-query for itself from a memory-consuming point of view (may cost
 a little bit performance).

 That beeing said, if you cache a filter for foo:user, foo:moderator and
 foo:manager you can combine those filters with AND, OR, NOT or whatever
 without recomputing every filter over and over again 

Re: Solr soft commit feature

2012-02-14 Thread Mark Miller
This has not been ported back to the 3.X line yet - mostly because it involved 
some rather large and invasive changes that I wanted to bake on trunk for some 
time first.

Even still, the back port is not trivial, so I don't know that it's something 
I'd personally be able to get to in the short term. If I had any free time, I'd 
probably prefer pushing towards a 4 release with NRT. Some of the changes also 
broke back compat behavior in ways that are more acceptable over a major 
release.

Someone else might jump in and do the work of course.

On Feb 14, 2012, at 7:41 PM, Dipti Srivastava wrote:

 Hi All,
 Is there a way to soft commit in the current released version of solr 3.5?
 
 Regards,
 Dipti Srivastava
 
 
 This message is private and confidential. If you have received it in error, 
 please notify the sender and remove it from your system.
 
 

- Mark Miller
lucidimagination.com













Re: Can I rebuild an index and remove some fields?

2012-02-14 Thread Li Li
I have roughly read the codes of 4.0 trunk. maybe it's feasible.
SegmentMerger.add(IndexReader) will add to be merged Readers
merge() will call
  mergeTerms(segmentWriteState);
  mergePerDoc(segmentWriteState);

   mergeTerms() will construct fields from IndexReaders
for(int
readerIndex=0;readerIndexmergeState.readers.size();readerIndex++) {
  final MergeState.IndexReaderAndLiveDocs r =
mergeState.readers.get(readerIndex);
  final Fields f = r.reader.fields();
  final int maxDoc = r.reader.maxDoc();
  if (f != null) {
slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex));
fields.add(f);
  }
  docBase += maxDoc;
}
So If you wrapper your IndexReader and override its fields() method,
maybe it will work for merge terms.

for DocValues, it can also override AtomicReader.docValues(). just
return null for fields you want to remove. maybe it should
traverse CompositeReader's getSequentialSubReaders() and wrapper each
AtomicReader

other things like term vectors norms are similar.
On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart bstewart...@gmail.comwrote:

 I was thinking if I make a wrapper class that aggregates another
 IndexReader and filter out terms I don't want anymore it might work.   And
 then pass that wrapper into SegmentMerger.  I think if I filter out terms
 on GetFieldNames(...) and Terms(...) it might work.

 Something like:

 HashSetstring ignoredTerms=...;

 FilteringIndexReader wrapper=new FilterIndexReader(reader);

 SegmentMerger merger=new SegmentMerger(writer);

 merger.add(wrapper);

 merger.Merge();





 On Feb 14, 2012, at 1:49 AM, Li Li wrote:

  for method 2, delete is wrong. we can't delete terms.
you also should hack with the tii and tis file.
 
  On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote:
 
  method1, dumping data
  for stored fields, you can traverse the whole index and save it to
  somewhere else.
  for indexed but not stored fields, it may be more difficult.
 if the indexed and not stored field is not analyzed(fields such as
  id), it's easy to get from FieldCache.StringIndex.
 But for analyzed fields, though theoretically it can be restored from
  term vector and term position, it's hard to recover from index.
 
  method 2, hack with metadata
  1. indexed fields
   delete by query, e.g. field:*
  2. stored fields
because all fields are stored sequentially. it's not easy to
 delete
  some fields. this will not affect search speed. but if you want to get
  stored fields,  and the useless fields are very long, then it will slow
  down.
also it's possible to hack with it. but need more effort to
  understand the index file format  and traverse the fdt/fdx file.
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
 
  this will give you some insight.
 
 
  On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.com
 wrote:
 
  Lets say I have a large index (100M docs, 1TB, split up between 10
  indexes).  And a bunch of the stored and indexed fields are not
 used in
  search at all.  In order to save memory and disk, I'd like to rebuild
 that
  index *without* those fields, but I don't have original documents to
  rebuild entire index with (don't have the full-text anymore, etc.).  Is
  there some way to rebuild or optimize an existing index with only a
 sub-set
  of the existing indexed fields?  Or alternatively is there a way to
 avoid
  loading some indexed fields at all ( to avoid loading term infos and
 terms
  index ) ?
 
  Thanks
  Bob
 
 
 




Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Doing so now, will let you know if I continue to see the same issues

On Tue, Feb 14, 2012 at 4:59 PM, Mark Miller markrmil...@gmail.com wrote:
 Doh - looks like I was just seeing a test issue. Do you mind updating and 
 trying the latest rev? At the least there should be some better logging 
 around the recovery.

 I'll keep working on tests in the meantime.

 - Mark

 On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:

 Sounds good, if I pull the latest from trunk and rerun will that be
 useful or were you able to duplicate my issue now?

 On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller markrmil...@gmail.com wrote:
 Okay Jamie, I think I have a handle on this. It looks like an issue with 
 what config files are being used by cores created with the admin core 
 handler - I think it's just picking up default config and not the correct 
 config for the collection. This means they end up using config that has no 
 UpdateLog defined - and so recovery fails.

 I've added more logging around this so that it's easy to determine that.

 I'm investigating more and working on a test + fix. I'll file a JIRA issue 
 soon as well.

 - Mark

 On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.

 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller markrmil...@gmail.com 
 wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).

 - Mark

 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson jej2...@gmail.com wrote:
 Yes, I have the following layout on the FS

 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
    -data
  -slice1_shard2
    -data

 if it matters I'm running everything from localhost, zk and the solr 
 shards

 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren ssi...@gmail.com wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 Jamie Johnson jej2...@gmail.com kirjoitti:

 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com












 - Mark Miller
 lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
All of the nodes now show as being Active.  When starting the replicas
I did receive the following message though.  Not sure if this is
expected or not.

INFO: Attempting to replicate from
http://JamiesMac.local:8501/solr/slice2_shard2/
Feb 14, 2012 10:53:34 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to
recover:org.apache.solr.common.SolrException: null
java.lang.NullPointerException  at
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)  at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

null  java.lang.NullPointerExceptionat
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

request: 
http://JamiesMac.local:8501/solr/admin/cores?action=PREPRECOVERYcore=slice2_shard2nodeName=JamiesMac.local:8502_solrcoreNodeName=JamiesMac.local:8502_solr_slice2_shard1wt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:208)

Feb 14, 2012 10:53:34 PM org.apache.solr.update.UpdateLog dropBufferedUpdates

feeding mahout cluster output back to solr

2012-02-14 Thread abhayd
hi 
at present we use carrot2 for clustering and doing analysis on customer
feedback data. Since its in memory and search time we are having issues with
performance and cluster size.

I was reading about generating clusters using mahout from solr index data. 

But can we feed segmentation generated by mahout back into solr to use as
facets? I am not even sure how the output from mahout looks like so wanted
to know



--
View this message in context: 
http://lucene.472066.n3.nabble.com/feeding-mahout-cluster-output-back-to-solr-tp3745883p3745883.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OR-FilterQuery

2012-02-14 Thread Mikhail Khludnev
On Tue, Feb 14, 2012 at 11:13 PM, Em mailformailingli...@yahoo.de wrote:

 Hi Mikhail,

  it will use per segment bitset at contrast to Solr's fq which caches for
  top level reader.
 Could you explain why this bitset would be per-segment based, please?

I don't see a reason why this *have* to be so.

it's just how org.apache.lucene.search.CachingWrapperFilter works. The
first out-of-the box stuff which I've found.
as an top-level segment alternative we need
org.apache.solr.search.SolrIndexSearcher.getDocSet(Query).

btw, one more top-level snippet

class FQParser extends QParser{

Query parse(...){
  return new SolrConstantScoreQuery(
  solrIndexSearcher.getDocSet(
   subQuery(localParam.get(V))
  ).getTopFilter())
}
}



 What is the benefit you are seeing?

 It seems like two different POVs: Lucene prefer per segment caching to
have fast incremental updates, but maybe 'because it's good but not in
worst case' (I guess I've heard it there
http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr)
Solr prefer top-reader caches.


 Kind regards,
 Em

 Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
  Hi Em,
 
  I briefly read the thread. Are you talking about combing of cached
 clauses
  of BooleanQuery, instead of evaluating whole BQ as a filter?
 
  I found something like that in API (but only in API)
 
 http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)
 
  Am I get you right? Why do you need it, btw? If I'm ..
  I have idea how to do it in two mins:
 
  q=+f:text
  +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3
 _query_:{!fq}id:4)...
 
  Right leg will be a BooleanQuery with SHOULD clauses backed on cached
  queries (see below).
 
  if you are not scarred by the syntax yet you can implement trivial
  fqQParserPlugin, which will be just
 
  // lazily through User/Generic Cache
  q = new FilteredQuery (new MatchAllDocsQuery(), new
  CachingWrapperFilter(new
  QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
  return q;
 
  it will use per segment bitset at contrast to Solr's fq which caches for
  top level reader.
 
  WDYT?
 
  On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de
 wrote:
 
  Hi,
 
  have a look at:
  http://search-lucene.com/m/Z8lWGEiKoI
 
  I think not much had changed since then.
 
  Regards,
  Em
 
  Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
  Hi,
 
  how efficent is such an query:
 
  q=some text
  fq=id:(1 OR 2 OR 3...)
 
  Should I better use q:some text AND id:(1 OR 2 OR 3...)?
 
  Is the Filter Cache used for the OR'ed fq?
 
  Thank you
 
 
 
 
 
 




-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: OR-FilterQuery

2012-02-14 Thread Em
Hi Mikhail,

 it's just how org.apache.lucene.search.CachingWrapperFilter works. The
 first out-of-the box stuff which I've found.
Thanks for your explanation and snippets - I thought this was configurable.

Regards,
Em

Am 15.02.2012 06:16, schrieb Mikhail Khludnev:
 On Tue, Feb 14, 2012 at 11:13 PM, Em mailformailingli...@yahoo.de wrote:
 
 Hi Mikhail,

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.
 Could you explain why this bitset would be per-segment based, please?
 
 I don't see a reason why this *have* to be so.

 it's just how org.apache.lucene.search.CachingWrapperFilter works. The
 first out-of-the box stuff which I've found.
 as an top-level segment alternative we need
 org.apache.solr.search.SolrIndexSearcher.getDocSet(Query).
 
 btw, one more top-level snippet
 
 class FQParser extends QParser{
 
 Query parse(...){
   return new SolrConstantScoreQuery(
   solrIndexSearcher.getDocSet(
subQuery(localParam.get(V))
   ).getTopFilter())
 }
 }
 
 
 
 What is the benefit you are seeing?

  It seems like two different POVs: Lucene prefer per segment caching to
 have fast incremental updates, but maybe 'because it's good but not in
 worst case' (I guess I've heard it there
 http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/many-facets-apache-solr)
 Solr prefer top-reader caches.
 
 
 Kind regards,
 Em

 Am 14.02.2012 19:33, schrieb Mikhail Khludnev:
 Hi Em,

 I briefly read the thread. Are you talking about combing of cached
 clauses
 of BooleanQuery, instead of evaluating whole BQ as a filter?

 I found something like that in API (but only in API)

 http://lucene.apache.org/solr/api/org/apache/solr/search/ExtendedQuery.html#setCacheSep(boolean)

 Am I get you right? Why do you need it, btw? If I'm ..
 I have idea how to do it in two mins:

 q=+f:text
 +(_query_:{!fq}id:1 _query_:{!fq}id:2 _query_:{!fq}id:3
 _query_:{!fq}id:4)...

 Right leg will be a BooleanQuery with SHOULD clauses backed on cached
 queries (see below).

 if you are not scarred by the syntax yet you can implement trivial
 fqQParserPlugin, which will be just

 // lazily through User/Generic Cache
 q = new FilteredQuery (new MatchAllDocsQuery(), new
 CachingWrapperFilter(new
 QueryWrapperFilter(subQuery(localParams.get(QueryParsing.V);
 return q;

 it will use per segment bitset at contrast to Solr's fq which caches for
 top level reader.

 WDYT?

 On Mon, Feb 13, 2012 at 11:34 PM, Em mailformailingli...@yahoo.de
 wrote:

 Hi,

 have a look at:
 http://search-lucene.com/m/Z8lWGEiKoI

 I think not much had changed since then.

 Regards,
 Em

 Am 13.02.2012 20:17, schrieb spr...@gmx.eu:
 Hi,

 how efficent is such an query:

 q=some text
 fq=id:(1 OR 2 OR 3...)

 Should I better use q:some text AND id:(1 OR 2 OR 3...)?

 Is the Filter Cache used for the OR'ed fq?

 Thank you