Re: DIH - incorrect datasource being picked up by XPathEntityProcessor

2012-07-16 Thread girishyes

Thanks Gora, I tried that but didn't help.

Regards.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802p3995211.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: are stopwords indexed?

2012-07-16 Thread Lance Norskog
Look at the index with the Schema Browser in the Solr UI. This pulls
the terms for each field.

On Sun, Jul 15, 2012 at 8:38 PM, Giovanni Gherdovich
g.gherdov...@gmail.com wrote:
 Hi all,

 are stopwords from the stopwords.txt config file
 supposed to be indexed?

 I would say no, but this is the situation I am
 observing on my Solr instance:

 * I have a bunch of stopwords in stopwords.txt
 * my fields are of fieldType text from the example schema.xml,
   i.e. I have

 -- -- 8 -- -- 8 -- -- 8 -- -- 8
fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
 [...]
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_FR.txt
 enablePositionIncrements=true
 /
 [...]
   /analyzer
   analyzer type=query
  [...]
  filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_FR.txt
 enablePositionIncrements=true
 /
   /analyzer
/fieldType
 -- -- 8 -- -- 8 -- -- 8 -- -- 8

 * searching for a stopwords thru solr gives always zero results
 * inspecting the index with LuCLI
 http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html
   show that all stopwords are in my index. Note that I query
   LuCLI specifying the field, i.e. with myFieldName:and
   and not just with the stopword and.

 Is this normal?

 Are stopwords indexed?

 Cheers,
 Giovanni



-- 
Lance Norskog
goks...@gmail.com


Re: Solr facet multiple constraint

2012-07-16 Thread davidbougearel
Ok i'm added the debug, there is the query from the response after executing
query :

facet=true,sort=publishingdate
desc,debugQuery=true,facet.mincount=1,q=service:1 AND
publicationstatus:LIVE,facet.field=pillar,wt=javabin,fq=(((pillar:10))),version=2}},response={numFound=2,start=0,docs=[SolrDocument[{uniquenumber=UniqueNumber1,
name=Doc 1, publicationstatus=LIVE, service=1, servicename=service_1,
pillar=[10], region=EU, regionname=Europe, documenttype=TRACKER,
publishingdate=Sun Jul 15 09:03:32 CEST 2012, publishingyear=2012,
teasersummary=Seo_Description, content=answer, creator=chandan, version=1,
documentinstanceid=1}], SolrDocument[{uniquenumber=UniqueNumber2, name=Doc
2, publicationstatus=LIVE, service=1, servicename=service_1, pillar=[10],
region=EU, regionname=Europe, documenttype=TRACKER, publishingdate=Sat Jul
14 09:03:32 CEST 2012, publishingyear=2012, teasersummary=Seo_Description,
content=answer, creator=chandan, version=1,
documentinstanceid=1}]]},facet_counts={facet_queries={},facet_fields={pillar={10=2}},facet_dates={},facet_ranges={}},debug={rawquerystring=service:1
AND publicationstatus:LIVE,querystring=service:1 AND
publicationstatus:LIVE,parsedquery=+service:1
+publicationstatus:LIVE,parsedquery_toString=+service:1
+publicationstatus:LIVE,explain={UniqueNumber1=
1.2917422 = (MATCH) sum of:
  0.7741482 = (MATCH) weight(service:1 in 0), product of:
0.7741482 = queryWeight(service:1), product of:
  1.0 = idf(docFreq=4, maxDocs=5)
  0.7741482 = queryNorm
1.0 = (MATCH) fieldWeight(service:1 in 0), product of:
  1.0 = tf(termFreq(service:1)=1)
  1.0 = idf(docFreq=4, maxDocs=5)
  1.0 = fieldNorm(field=service, doc=0)
  0.517594 = (MATCH) weight(publicationstatus:LIVE in 0), product of:
0.6330043 = queryWeight(publicationstatus:LIVE), product of:
  0.81767845 = idf(docFreq=5, maxDocs=5)
  0.7741482 = queryNorm
0.81767845 = (MATCH) fieldWeight(publicationstatus:LIVE in 0), product
of:
  1.0 = tf(termFreq(publicationstatus:LIVE)=1)
  0.81767845 = idf(docFreq=5, maxDocs=5)
  1.0 = fieldNorm(field=publicationstatus, doc=0)
,UniqueNumber2=
1.2917422 = (MATCH) sum of:
  0.7741482 = (MATCH) weight(service:1 in 0), product of:
0.7741482 = queryWeight(service:1), product of:
  1.0 = idf(docFreq=4, maxDocs=5)
  0.7741482 = queryNorm
1.0 = (MATCH) fieldWeight(service:1 in 0), product of:
  1.0 = tf(termFreq(service:1)=1)
  1.0 = idf(docFreq=4, maxDocs=5)
  1.0 = fieldNorm(field=service, doc=0)
  0.517594 = (MATCH) weight(publicationstatus:LIVE in 0), product of:
0.6330043 = queryWeight(publicationstatus:LIVE), product of:
  0.81767845 = idf(docFreq=5, maxDocs=5)
  0.7741482 = queryNorm
0.81767845 = (MATCH) fieldWeight(publicationstatus:LIVE in 0), product
of:
  1.0 = tf(termFreq(publicationstatus:LIVE)=1)
  0.81767845 = idf(docFreq=5, maxDocs=5)
  1.0 = fieldNorm(field=publicationstatus, doc=0)
},QParser=LuceneQParser,filter_queries=[(((pillar:10)))

As you can see in this request i'm talking about pillar not about user.

Thanks for all, David.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974p3995215.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Spatial Search for Specif Areas on Map

2012-07-16 Thread samabhiK
David,

Thanks for such a detailed response. The data volume I mentioned is the
total set of records we have - but we would never ever need to search the
entire base in one query; we would divide the data by region or zip code.
So, in that case I assume that for a single region, we would not have more
than 200M records (this is real , we have a region with that many records).

So, I can assume that I can create shards based on regions and the requests
would get distributed among these region servers, right? You also mentioned
about ~20 concurrent queries per shard - do you have links to some
benchmarks? I am very interested to know about the hardware sizing details
for such a setup.

About setting up Solr for a single shard, I think I will go by your advice. 
Will see how much a single shard can handle in a decent machine :)

The reason why I came up with that figure was, I have a user base of 500k
and theres a lot of activity which would happen on the map - every time
someone moves the tiles, zooms in/out, scrolls, we are going to send a
server side request to fetch some data ( I agree we can benefit much using
caching but I believe Solr itself has its own local cache). I might be a bit
unrealistic with my 10K rps projections but I have read about 9K rps to map
servers from some sources on the internet. 

And, NO, I don't work for Google :) But who knows we might be building
something that can get so much traffic to us in a while. :D

BTW, my question still remains - can we do search on polygonal areas on the
map? If so, do you have any link where i can get more details? Bounding Box
thing wont work for me I guess :(

Sam


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995209.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.0-ALPHA for general development use?

2012-07-16 Thread John Field
OK: that is helpful, thanks!

On 13 July 2012 15:44, Mark Miller markrmil...@gmail.com wrote:

 It really comes down to you.

 Many people run a trunk version of Solr in production. Some never would.
 Generally, bugs are fixed quickly, and trunk is pretty stable. The main
 issue is index format changes and upgrades. If you use trunk you generally
 have to be willing to reindex to upgrade. That's one nice thing about this
 Alpha - we are saying that unless there is a really bad bug, you will be
 able to upgrade to future versions without reindexing.

 Most of the code itself has been in dev and use for years - so it's not so
 risky in my opinion. It's almost more about Java APIs and what not than
 code stability when we say Alpha.

 In fact, just read this
 http://www.lucidimagination.com/blog/2012/07/03/4-0-alpha-whats-in-a-name/

 That should help clarify what this release is.

 On Fri, Jul 13, 2012 at 6:51 AM, John Field jfi...@astreetpress.com
 wrote:

  Hi, we are considering a long-term project (likely lifecycle of
  several years) with an initial production release in approximately
  three months.
 
  We're intending to use Solr 3.6.0, with a view for upgrading to 4.0
  upon stable release.
 
  However, http://lucene.apache.org/solr/ now has 4.0-ALPHA as the main
  download, implying this version is for general use.
 
  But on the other hand, the release notes state This is an alpha
  release for early adopters. and http://wiki.apache.org/solr/Solr4.0
  gives a timescale of 60 days minimum before final release.
 
  We'd like to use 4.0 features such as near real-time updates, but
  haven't identified these as must-haves for the initial release.
 
  Given that our first production release is likely to occur a month
  after that 60 days, is 4.0-ALPHA suitable for general product
  development, or is it recommended to stick with 3.6.0 and accept an
  upgrade cost when 4.0 is
  stable?
 
  (Perhaps this hinges on understanding why 4.0-ALPHA is now the main
  download option).
 
  Thanks.
 



 --
 - Mark

 http://www.lucidimagination.com




-- 

John Field, Software Architect
http://www.alexanderstreet.com - Alexander Street Press, world-leading
digital humanities publisher.


Re: Computed fields - can I put a function in fl?

2012-07-16 Thread maurizio1976
Yes,
sorry Just a typo.
I meant  
q=*:*fq=start=0rows=10qt=wt=explainOther=fl=product:(if(show_product:true,
product, )
thanks


On Sat, Jul 14, 2012 at 11:27 PM, Erick Erickson [via Lucene]
ml-node+s472066n3995045...@n3.nabble.com wrote:
 I think in 4.0 you can, but not 3.x as I remember. Your example has
 the fl as part
 of the highlight though, is that a typo?

 Best
 Erick

 On Fri, Jul 13, 2012 at 5:21 AM, maurizio1976
 [hidden email] wrote:

 Hi,
 I have 2 fields, one containing a string (product) and another containing
 a
 boolean (show_product).

 Is there a way of returning the product field with a value of null when
 the
 show_product field is false?

 I can make another field (product_computed) and index that with null where
 I
 need but I would like to understand if there is a better approach like
 putting a function query in the fl and make a computed field.

 something like:

 q=*:*fq=start=0rows=10fl=qt=wt=explainOther=hl.fl=/*product:(if(show_product:true,
 product, )*/

 that obviously doesn't work.

 thanks for any help

 Maurizio

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799p3995045.html
 To unsubscribe from Computed fields - can I put a function in fl?, click
 here.
 NAML


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Computed-fields-can-I-put-a-function-in-fl-tp3994799p3995218.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Computed fields - can I put a function in fl?

2012-07-16 Thread Yonik Seeley
On Mon, Jul 16, 2012 at 4:43 AM, maurizio1976
maurizio.picc...@gmail.com wrote:
 Yes,
 sorry Just a typo.
 I meant  
 q=*:*fq=start=0rows=10qt=wt=explainOther=fl=product:(if(show_product:true,
 product, )
 thanks

Functions normally derive their values from the fieldCache... there
isn't currently a function to load stored fields (e.g. your product
field), but it's not a bad idea (given this usecase).

Here's an example with the exampledocs that shows IN_STOCK_PRICE only
if the item is in stock, and otherwise shows 0.
This works because price is a single-valued indexed field that the
fieldCache works on.

http://localhost:8983/solr/query?
  q=*:*
fl=id, inStock, IN_STOCK_PRICE:if(inStock,price,0)

-Yonik
http://lucidimagination.com


Re: DIH include Fieldset in query

2012-07-16 Thread stockii
So you want to re-use same SQL sentence in many entities? 
Yes

is it necessary to deploy complete solr and lucene for this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-include-Fieldset-in-query-tp3994798p3995228.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: are stopwords indexed?

2012-07-16 Thread Michael Belenki
Hi Giovanni,

you have entered the stopwords into stopword.txt file, right? But in the
definition of the field type you are referencing stopwords_FR.txt..

best regards,

Michael
On Mon, 16 Jul 2012 05:38:04 +0200, Giovanni Gherdovich
g.gherdov...@gmail.com wrote:
 Hi all,
 
 are stopwords from the stopwords.txt config file
 supposed to be indexed?
 
 I would say no, but this is the situation I am
 observing on my Solr instance:
 
 * I have a bunch of stopwords in stopwords.txt
 * my fields are of fieldType text from the example schema.xml,
   i.e. I have
 
 -- -- 8 -- -- 8 -- -- 8 -- -- 8
fieldType name=text class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 [...]
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_FR.txt
 enablePositionIncrements=true
 /
 [...]
   /analyzer
   analyzer type=query
  [...]
  filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_FR.txt
 enablePositionIncrements=true
 /
   /analyzer
/fieldType
 -- -- 8 -- -- 8 -- -- 8 -- -- 8
 
 * searching for a stopwords thru solr gives always zero results
 * inspecting the index with LuCLI
 http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html
   show that all stopwords are in my index. Note that I query
   LuCLI specifying the field, i.e. with myFieldName:and
   and not just with the stopword and.
 
 Is this normal?
 
 Are stopwords indexed?
 
 Cheers,
 Giovanni


Re: are stopwords indexed?

2012-07-16 Thread Giovanni Gherdovich
Hi all, thank you for your replies.

Lance:
 Look at the index with the Schema Browser in the Solr UI. This pulls
 the terms for each field.

I did it, and it was the first alarm I got.
After the indexing, I went on the schema browser hoping
to don't see any stopword in the top-terms, but...
they were all there.

Michael:
 Hi Giovanni,

 you have entered the stopwords into stopword.txt file, right? But in the
 definition of the field type you are referencing stopwords_FR.txt..

good catch Micheal, but that's not the problem.

In my message I referred to stopwords.txt, but actually my
stopwords file is named  stopwords_FR.txt, consistently with
what I put in my schema.xml

By the way, your answers make me think that yes,
I have a problem: stopwords should not appear in the index.

what a weird situation:

* querying with SOLR for a stopword (say and) gives me zero result
  (so, somewhere in the indexing / searching pipeline my stopwords
file *is* taken into account)
* checking the index files with LuCLI for the same stopword give me
tons of hits.

cheers,
GGhh


Grouping performance problem

2012-07-16 Thread Agnieszka Kukałowicz
Hi,

Is the any way to make grouping searches more efficient?

My queries look like:
/select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

For index with 3 mln documents query for all docs with group=true takes
almost 4000ms. Because queryResultCache is not used next queries take a
long time also.

When I remove group=true and leave only faceting the query for all docs
takes much more less time: for first time ~ 700ms and next runs only 200ms
because of queryResultCache being used.

So with group=true the query is about 20 time slower than without it.
Is it possible or is there any way to improve performance with grouping?

My application needs grouping feature and all of the queries use it but the
performance of them is to low for production use.

I use Solr 4.x from trunk

Agnieszka Kukalowicz


Re: DIH - incorrect datasource being picked up by XPathEntityProcessor

2012-07-16 Thread girishyes

Okay... found the problem after some more debugging. I was using a wrong
datasource tag in the data-config.xml, may be Solr should validate the xml
against a schema so these kind of issues are caught upfront.

wrong: datalt;bs*ource name=fieldSource type=FieldReaderDataSource /
correct: datalt;bS*ource name=fieldSource type=FieldReaderDataSource
/

this resolved the issue.

Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802p3995246.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Rajani Maski
Yes, This feature will solve the below problem very neatly.

All,

 Is there any approach to achieve this for now?


--Rajani

On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky j...@basetechnology.comwrote:

 The answer appears to be No, but it's good to hear people express an
 interest in proposed features.

 -- Jack Krupansky

 -Original Message- From: Rajani Maski
 Sent: Sunday, July 15, 2012 12:02 AM
 To: solr-user@lucene.apache.org
 Subject: Facet on all the dynamic fields with *_s feature


 Hi All,

   Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
 with facet.field=*_s

   Link  :  
 https://issues.apache.org/**jira/browse/SOLR-247https://issues.apache.org/jira/browse/SOLR-247



  If it is not fixed, any suggestion on how do I achieve this?


 My requirement is just same as this one :
 http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
 field-tc2979407.html#nonehttp://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none


 Regards
 Rajani



Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Darren Govoni
You'll have to query the index for the fields and sift out the _s ones
and cache them or something.

On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote:

 Yes, This feature will solve the below problem very neatly.
 
 All,
 
  Is there any approach to achieve this for now?
 
 
 --Rajani
 
 On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky 
 j...@basetechnology.comwrote:
 
  The answer appears to be No, but it's good to hear people express an
  interest in proposed features.
 
  -- Jack Krupansky
 
  -Original Message- From: Rajani Maski
  Sent: Sunday, July 15, 2012 12:02 AM
  To: solr-user@lucene.apache.org
  Subject: Facet on all the dynamic fields with *_s feature
 
 
  Hi All,
 
Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
  with facet.field=*_s
 
Link  :  
  https://issues.apache.org/**jira/browse/SOLR-247https://issues.apache.org/jira/browse/SOLR-247
 
 
 
   If it is not fixed, any suggestion on how do I achieve this?
 
 
  My requirement is just same as this one :
  http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
  field-tc2979407.html#nonehttp://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none
 
 
  Regards
  Rajani
 




Re: Index version on slave incrementing to higher than master

2012-07-16 Thread Erick Erickson
Andrew:

I'm not entirely sure that's your problem, but it's the first thing I'd try.

As for your config files, see the section Replicating solrconfig.xml
here: http://wiki.apache.org/solr/SolrReplication. That at least
allows you to centralize separate solrconfigs for master and
slave, making promoting a slave to a master a bit easier

Best
Erick

On Sun, Jul 15, 2012 at 2:00 PM, Andrew Davidoff david...@qedmf.net wrote:
 Erick,

 Thank you. I think originally my thought was that if I had my slave
 configuration really close to my master config, it would be very easy to
 promote a slave to a master (and vice versa) if necessary. But I think you
 are correct that ripping out from the slave config anything that would
 modify an index in any way makes sense. I will give this a try very soon.

 Thanks again.
 Andy


 On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Gotta admit it's a bit puzzling, and surely you want to move to the 3x
 versions G..

 But at a guess, things might be getting confused on the slaves given
 you have a merge policy on them. There's no reason to have any
 policies on the slaves; slaves should just be about copying the files
 from the master, all the policies,commits,optimizes should be done on
 the master. About all the slave does is copy the current state of the index
 from the master.

 So I'd try removing everything but the replication from the slaves,
 including
 any autocommit stuff and just let replication do it's thing.

 And I'd replicate after the optimize if you keep the optimize going. You
 should
 end up with one segment in the index after that, on both the master and
 slave.
 You can't get any more merged than that.

 Of course you'll also copy the _entire_ index every time after you've
 optimized...

 Best
 Erick

 On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff david...@qedmf.net
 wrote:
  Hi,
 
  I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has a
  number of solr instances running on it (150 or so), and nightly most of
  them have documents written to them. The script that does these writes
  (adds) does a commit and an optimize on the indexes when it's entirely
  finished updating them, then initiates replication on the slave per
  instance. In this configuration, the index versions between master and
  slave remain in synch.
 
  The optimize portion, which, again, happens nightly, is taking a lot of
  time and I think it's unnecessary. I was hoping to stop doing this
 explicit
  optimize, and to let my merge policy handle that. However, if I don't do
 an
  optimize, and only do a commit before initiating slave replication, some
  hours later the slave is, for reasons that are unclear to me,
 incrementing
  its index version to 1 higher than the master.
 
  I am not really sure I understand the logs, but it looks like the
  incremented index version is the result of an optimize on the slave, but
 I
  am never issuing any commands against the slave aside from initiating
  replication, and I don't think there's anything in my solr configuration
  that would be initiating this. I do have autoCommit on with maxDocs of
  1000, but since I am initiating slave replication after doing a commit on
  the master, I don't think there would ever be any uncommitted documents
 on
  the slave. I do have a merge policy configured, but it's not clear to me
  that it has anything to do with this. And if it did, I'd expect to see
  similar behavior on the master (right?).
 
  I have included a snipped from my slave logs that shows this issue. In
 this
  snipped index version 1286065171264 is what the master has,
  and 1286065171265 is what the slave increments itself to, which is then
 out
  of synch with the master in terms of version numbers. Nothing that I know
  of is issuing any commands to the slave at this time. If I understand
 these
  logs (I might not), it looks like something issued an optimize that took
  1023720ms? Any ideas?
 
  Thanks in advance.
 
  Andy
 
 
 
  Jul 12, 2012 12:21:14 PM org.apache.solr.update.SolrIndexWriter close
  FINE: Closing Writer DirectUpdateHandler2
  Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy onCommit
  INFO: SolrDeletionPolicy.onCommit: commits:num=2
 
 
 commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h8,version=1286065171264,generation=620,filenames=[_h6.fnm,
  _h5.nrm, segments_h8, _h4.nrm, _h5.tii, _h4
  .tii, _h5.tis, _h4.tis, _h4.fdx, _h5.fnm, _h6.tii, _h4.fdt, _h5.fdt,
  _h5.fdx, _h5.frq, _h4.fnm, _h6.frq, _h6.tis, _h4.prx, _h4.frq, _h6.nrm,
  _h5.prx, _h6.prx, _h6.fdt, _h6
  .fdx]
 
 
 commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h9,version=1286065171265,generation=621,filenames=[_h7.tis,
  _h7.fdx, _h7.fnm, _h7.fdt, _h7.prx, segment
  s_h9, _h7.nrm, _h7.tii, _h7.frq]
  Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy
  updateCommits
  INFO: newest commit = 1286065171265
  Jul 12, 2012 12:21:14 PM 

Re: Query results vs. facets results

2012-07-16 Thread Erick Erickson
Ahhh, you need to look down another few lines. When you specify fq, there
should be a section of the debug output like
arr name=filter_queries
  .
  .
  .
/arr

where the array is the parsed form of the filter queries. I was thinking about
comparing that with the parsed form of the q parameter in the non-filter
case to see what insight one could gain from that.

But there's already one difference, when you use *, you get
 str name=parsedqueryID:*/str

Is it possible that you have some documents that do NOT have an ID field?
try *:* rather than just *. I'm guessing that your default search field is ID
and you have some documents without an ID field. Not a good guess if ID
is your uniqueKey though..

Try q=*:* -ID:* and see if you get 31 docs.

Also note that if you _have_ specified ID as your uniqueKey _but_ you didn't
re-index afterwards (actually, I'd blow away the entire
solrhome/data directory
and restart) you may have stale data in there that allowed documents to exist
that do not have uniqueKey fields.

Best
Erick

On Sun, Jul 15, 2012 at 4:49 PM, tudor tudor.zaha...@gmail.com wrote:
 Hi Erick,

 Thanks for the reply.

 The query:

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

 yields this in the debug section:

 lst name=debugstr name=rawquerystringCITY:MILTON/str
   str name=querystringCITY:MILTON/str
   str name=parsedqueryCITY:MILTON/str
   str name=parsedquery_toStringCITY:MILTON/str
   str name=QParserLuceneQParser/str

 There is no information about grouping.

 Second query:

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

 yields this in the debug section:

 lst name=debug
   str name=rawquerystring*/str
   str name=querystring*/str
   str name=parsedqueryID:*/str
   str name=parsedquery_toStringID:*/str
   str name=QParserLuceneQParser/str

 To be honest, these do not tell me too much. I would like to see some
 information about the grouping, since I believe this is where I am missing
 something.

 In the mean time, I have combined the two queries above, hoping to make some
 sense out of the results. The following query filters all the entries with
 the city name MILTON and groups together the ones with the same ID. Also,
 the query facets the entries on city, grouping the ones with the same ID. So
 the results numbers refer to the number of groups.

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

 yields the same (for me perplexing) results:

 lst name=grouped
   lst name=ID
   int name=matches284/int
   int name=ngroups134/int

 (i.e.: fq says: 134 groups with CITY:MILTON)
 ...

 lst name=facet_counts
   lst name=facet_queries/
   lst name=facet_fields
...
   int name=MILTON103/int

 (i.e.: faceted search says: 103 groups with CITY:MILTON)

 I really believe that these different results have something to do with the
 grouping that Solr makes, but I do not know how to dig into this.

 Thank you and best regards,
 Tudor

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping performance problem

2012-07-16 Thread Pavel Goncharik
Hi Agnieszka ,

if you don't need number of groups, you can try leaving out
group.ngroups=true param.
In this case Solr apparently skips calculating all groups and delivers
results much faster.
At least for our application the difference in performance
with/without group.ngroups=true is significant (have to say, we use
Solr 3.6).

WBR,
Pavel

On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
agnieszka.kukalow...@usable.pl wrote:
 Hi,

 Is the any way to make grouping searches more efficient?

 My queries look like:
 /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

 For index with 3 mln documents query for all docs with group=true takes
 almost 4000ms. Because queryResultCache is not used next queries take a
 long time also.

 When I remove group=true and leave only faceting the query for all docs
 takes much more less time: for first time ~ 700ms and next runs only 200ms
 because of queryResultCache being used.

 So with group=true the query is about 20 time slower than without it.
 Is it possible or is there any way to improve performance with grouping?

 My application needs grouping feature and all of the queries use it but the
 performance of them is to low for production use.

 I use Solr 4.x from trunk

 Agnieszka Kukalowicz


Re: Facet on all the dynamic fields with *_s feature

2012-07-16 Thread Rajani Maski
In this URL  -  https://issues.apache.org/jira/browse/SOLR-247

there are *patches *and one patch with name *SOLR-247-FacetAllFields*

Will that help me to fix this problem?

If yes, how do I  add this to solr plugin ?


Thanks  Regards
Rajani




On Mon, Jul 16, 2012 at 5:04 PM, Darren Govoni dar...@ontrenet.com wrote:

 You'll have to query the index for the fields and sift out the _s ones
 and cache them or something.

 On Mon, 2012-07-16 at 16:52 +0530, Rajani Maski wrote:

  Yes, This feature will solve the below problem very neatly.
 
  All,
 
   Is there any approach to achieve this for now?
 
 
  --Rajani
 
  On Sun, Jul 15, 2012 at 6:02 PM, Jack Krupansky j...@basetechnology.com
 wrote:
 
   The answer appears to be No, but it's good to hear people express an
   interest in proposed features.
  
   -- Jack Krupansky
  
   -Original Message- From: Rajani Maski
   Sent: Sunday, July 15, 2012 12:02 AM
   To: solr-user@lucene.apache.org
   Subject: Facet on all the dynamic fields with *_s feature
  
  
   Hi All,
  
 Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic
 field
   with facet.field=*_s
  
 Link  :  https://issues.apache.org/**jira/browse/SOLR-247
 https://issues.apache.org/jira/browse/SOLR-247
  
  
  
If it is not fixed, any suggestion on how do I achieve this?
  
  
   My requirement is just same as this one :
   http://lucene.472066.n3.**nabble.com/Dynamic-facet-**
   field-tc2979407.html#none
 http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none
 
  
  
   Regards
   Rajani
  





Re: Grouping performance problem

2012-07-16 Thread Agnieszka Kukałowicz
Hi Pavel,

I tried with group.ngroups=false but didn't notice a big improvement.
The times were still about 4000 ms. It doesn't solve my problem.
Maybe this is because of my index type. I have millions of documents but
only about 20 000 groups.

 Cheers
 Agnieszka

2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com

 Hi Agnieszka ,

 if you don't need number of groups, you can try leaving out
 group.ngroups=true param.
 In this case Solr apparently skips calculating all groups and delivers
 results much faster.
 At least for our application the difference in performance
 with/without group.ngroups=true is significant (have to say, we use
 Solr 3.6).

 WBR,
 Pavel

 On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
 agnieszka.kukalow...@usable.pl wrote:
  Hi,
 
  Is the any way to make grouping searches more efficient?
 
  My queries look like:
 
 /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  For index with 3 mln documents query for all docs with group=true takes
  almost 4000ms. Because queryResultCache is not used next queries take a
  long time also.
 
  When I remove group=true and leave only faceting the query for all docs
  takes much more less time: for first time ~ 700ms and next runs only
 200ms
  because of queryResultCache being used.
 
  So with group=true the query is about 20 time slower than without it.
  Is it possible or is there any way to improve performance with grouping?
 
  My application needs grouping feature and all of the queries use it but
 the
  performance of them is to low for production use.
 
  I use Solr 4.x from trunk
 
  Agnieszka Kukalowicz



Re: JRockit with SOLR3.4/3.5

2012-07-16 Thread Salman Akram
Michael,

Thanks for the response. Below is the stack trace.

Note: Our environment is 64 bit and the Initial Pool size is set to 4GB and
Max pool size is 12GB so it doesn't makes sense why it tries to allocate
24GB (even that is available as the total RAM is 64GB).

This issue doesn't come with SOLR 1.4

-

SEVERE: Error waiting for multi-thread deployment of directories to
completehostConfig.deployWar=Deploying web application archive {0}
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError:
classblock allocation, 1535880 loaded, 1536K footprint, in check_alloc
(src/jvm/model/classload/classalloc.c:215).

Attempting to allocate 24000M bytes

There is insufficient native memory for the Java
Runtime Environment to continue.

Possible reasons:
  The system is out of physical RAM or swap space
  In 32 bit mode, the process size limit was hit

Possible solutions:
  Reduce memory load on the system
  Increase physical memory or swap space
  Check if swap backing store is full
  Use 64 bit Java on a 64 bit OS
  Decrease Java heap size (-Xmx/-Xms)
  Decrease number of Java threads
  Decrease Java thread stack sizes (-Xss)
  Disable compressed references (-XXcompressedRefs=false)

at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.catalina.startup.HostConfig.deployDirectories(
HostConfig.java:1018)
at org.apache.catalina.startup.HostConfig.deployApps(
HostConfig.java:475)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1412)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(
HostConfig.java:312)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(
LifecycleSupport.java:119)
at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(
LifecycleBase.java:91)
at org.apache.catalina.util.LifecycleBase.setStateInternal(
LifecycleBase.java:401)
at org.apache.catalina.util.LifecycleBase.setState(
LifecycleBase.java:346)
at org.apache.catalina.core.ContainerBase.startInternal(
ContainerBase.java:1117)
at org.apache.catalina.core.StandardHost.startInternal(
StandardHost.java:782)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(
ContainerBase.java:1526)
at org.apache.catalina.core.ContainerBase$StartChild.call(
ContainerBase.java:1515)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:139)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:909)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.OutOfMemoryError: classblock allocation, 1535880
loaded, 1536K footprint, in check_alloc (src/jvm/model/classload/
classalloc.c:215).

Attempting to allocate 24000M bytes

There is insufficient native memory for the Java
Runtime Environment to continue.

Possible reasons:
  The system is out of physical RAM or swap space
  In 32 bit mode, the process size limit was hit

Possible solutions:
  Reduce memory load on the system
  Increase physical memory or swap space
  Check if swap backing store is full
  Use 64 bit Java on a 64 bit OS
  Decrease Java heap size (-Xmx/-Xms)
  Decrease number of Java threads
  Decrease Java thread stack sizes (-Xss)
  Disable compressed references (-XXcompressedRefs=false)

at sun.misc.Unsafe.defineClass(Native Method)
at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:45)
at sun.reflect.MethodAccessorGenerator$1.run(
MethodAccessorGenerator.java:381)
at sun.reflect.MethodAccessorGenerator.generate(
MethodAccessorGenerator.java:377)
at sun.reflect.MethodAccessorGenerator.generateConstructor(
MethodAccessorGenerator.java:76)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(
NativeConstructorAccessorImpl.java:30)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at java.lang.Class.newInstance0(Class.java:355)
at java.lang.Class.newInstance(Class.java:308)
at javax.xml.parsers.FactoryFinder.newInstance(FactoryFinder.java:147)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:233)
at javax.xml.parsers.SAXParserFactory.newInstance(
SAXParserFactory.java:128)
at org.apache.tomcat.util.digester.Digester.getFactory(
Digester.java:470)
at org.apache.tomcat.util.digester.Digester.getParser(Digester.java:677)
at org.apache.catalina.startup.ContextConfig.init(
ContextConfig.java:780)
at org.apache.catalina.startup.ContextConfig.lifecycleEvent(
ContextConfig.java:320)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(
LifecycleSupport.java:119)
at 

Re: Lost answers?

2012-07-16 Thread Michael Della Bitta
Hello, Bruno,

No, 4 simultaneous requests should not be a problem.

Have you checked the Tomcat logs or logged the data in the query
response object to see if there are any clues to what the problem
might be?

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina bmann...@free.fr wrote:
 I forgot:

 I do the request on the uniqueKey field, so each request gets one document

 Le 15/07/2012 14:11, Bruno Mannina a écrit :

 Dear Solr Users,

 I have a solr3.6 + Tomcat and I have a program that connect 4 http
 requests at the same time.
 I must do 1902 requests.

 I do several tests but each time it losts some requests:
 - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

 With Jetty, I get always 1902 docs.

 As it's a dev' environment, I'm alone to test it.

 Is it a problem to do 4 requests at the same time for tomcat6?

 thanks for your info,

 Bruno






Re: Solr - Spatial Search for Specif Areas on Map

2012-07-16 Thread David Smiley (@MITRE.org)

samabhiK wrote
 
 David,
 
 Thanks for such a detailed response. The data volume I mentioned is the
 total set of records we have - but we would never ever need to search the
 entire base in one query; we would divide the data by region or zip code.
 So, in that case I assume that for a single region, we would not have more
 than 200M records (this is real , we have a region with that many
 records).
 
 So, I can assume that I can create shards based on regions and the
 requests would get distributed among these region servers, right?
 

The fact that your searches are always per region (or almost always) helps
things a lot.  Instead of doing a distributed search to all shards, you
would search the specific shard, or worst case 2 shards, and not burden the
other shards with queries you no won't be satisfied.  This new information
suggests that the total 10k queries per second volume would be divided
amongst your shards, so 10k / 40 shards = 250 queries per second.  Now we
are approaching something reasonable.  If any of your regions need to scale
up (more query volume) or out (big region) then you can do that on a case by
case basis.  I can think of ways to optimize that for spatial.

Thinking in terms of pure queries per second on a machine, say a 16 CPU
core/machine one, then 250/16 = ~ 16 queries per second per CPU core of a
shard.  I think that's plausible but you would really need to determine how
many exactly you could do.  I assume the spatial index is going to fit in
RAM.  If successful, this means ~40 machines (one per region). 



  You also mentioned about ~20 concurrent queries per shard - do you have
 links to some benchmarks? I am very interested to know about the hardware
 sizing details for such a setup.
 

The best I can offer is on the geospatial side: 
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=12988316page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12988316

But this was an index of only 2M distinct points.  It may be that these
figures still hold if the overhead of the spatial query with data is so low
that other constant elements comprise the times, but I really don't know. 
To be clear, this is older code that is not the same as the latest, but they
are algorithmically the same.  The current code has an error epsilon to the
query shape which helps scale further.  There is plenty more optimization
that could be done, like a more efficient binary grid scheme, using Hilbert
Curves, and using an optimizer to find the hotspots to try and optimize
them.



 About setting up Solr for a single shard, I think I will go by your
 advice.  Will see how much a single shard can handle in a decent machine
 :)
 
 The reason why I came up with that figure was, I have a user base of 500k
 and theres a lot of activity which would happen on the map - every time
 someone moves the tiles, zooms in/out, scrolls, we are going to send a
 server side request to fetch some data ( I agree we can benefit much using
 caching but I believe Solr itself has its own local cache). I might be a
 bit unrealistic with my 10K rps projections but I have read about 9K rps
 to map servers from some sources on the internet. 
 
 And, NO, I don't work for Google :) But who knows we might be building
 something that can get so much traffic to us in a while. :D
 
 BTW, my question still remains - can we do search on polygonal areas on
 the map? If so, do you have any link where i can get more details?
 Bounding Box thing wont work for me I guess :(
 
 Sam
 

Polygons are supported; I've been doing them for years now.  But it requires
some extensions.  Today, you need the latest Solr trunk, and you need to
apply the Solr adapters to Lucene 4 spatial SOLR-3304, and you need to have
the JTS jar on your classpath, something you download separately.  BTW here
are some basic
docs:http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4  



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995333.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-16 Thread Mark Miller

On Jul 15, 2012, at 2:45 PM, Nick Koton wrote:

 I converted my program to use
 the SolrServer::add(CollectionSolrInputDocument docs) method with 100
 documents in each add batch.  Unfortunately, the out of memory errors still
 occur without client side commits.

This won't change much unfortunately - currently, each host has 10 add and 10 
deletes buffered for it before it will flush. There are some recovery 
implications that have kept that buffer size low so far - but what it ends up 
meaning is that when you stream docs, every 10 docs is sent off on a thread. 
Generally, you might be able to keep up with this - but the commit cost appears 
to perhaps cause a small resource drop that backs things up a bit - and some of 
those threads take a little longer to finish while new threads fire off to keep 
servicing the constantly coming new documents. What appears will happen is 
large momentary spikes in the number of threads. Each thread needs a bit of 
space on the heap, and it would seem with a high enough spike you could get an 
OOM. In my testing, I have not triggered that yet, but I have seen large thread 
count spikes.

Raising the add doc buffer to 100 docs makes those thread bursts much, much 
less severe. I can't remember all of the implications of that buffer size 
though - need to talk to Yonik about it.

We could limit the number of threads for that executor, but I think that comes 
with some negatives as well.

You could try lowering -Xss so that each thread uses less RAM (if possible) as 
a shorter term (possible) workaround.

You could also use multiple threads with the std HttpSolrServer - it won't be 
quite as fast probably, but it can get close(ish).

My guess is that your client commits help because a commit will cause a wait on 
all outstanding requests - so that the commit is in logical order - this 
probably is like releasing a pressure valve - the system has a chance to catch 
up and reclaim lots of threads.

We will keep looking into what the best improvement is.

- Mark Miller
lucidimagination.com













Re: Grouping performance problem

2012-07-16 Thread alxsss



Re: Solr - Spatial Search for Specif Areas on Map

2012-07-16 Thread David Smiley (@MITRE.org)
Thinking more about this, the way to get a Lucene based system to scale to
the maximum extent possible for geospatial queries would be to get a
geospatial query to be satisfied by just one (usually) Lucene index segment. 
It would take quite a bit of customization and work to make this happen.  I
suppose you could always optimize a Solr index and thus get one Lucene
segment, but deploy 10-20x the number of Solr shards (aka Solr cores) that
one would normally do, and that wouldn't be that hard.  There would be some
work in determining which Solr core (== Lucene segment) a given document
should belong to and which ones to query.

~ David

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995357.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping performance problem

2012-07-16 Thread Agnieszka Kukałowicz
I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS=-server -Xms4096M -Xmx4096M
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 alx...@aim.com

 What are the RAM of your server and size of the data folder?



 -Original Message-
 From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jul 16, 2012 6:16 am
 Subject: Re: Grouping performance problem


 Hi Pavel,

 I tried with group.ngroups=false but didn't notice a big improvement.
 The times were still about 4000 ms. It doesn't solve my problem.
 Maybe this is because of my index type. I have millions of documents but
 only about 20 000 groups.

  Cheers
  Agnieszka

 2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com

  Hi Agnieszka ,
 
  if you don't need number of groups, you can try leaving out
  group.ngroups=true param.
  In this case Solr apparently skips calculating all groups and delivers
  results much faster.
  At least for our application the difference in performance
  with/without group.ngroups=true is significant (have to say, we use
  Solr 3.6).
 
  WBR,
  Pavel
 
  On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
  agnieszka.kukalow...@usable.pl wrote:
   Hi,
  
   Is the any way to make grouping searches more efficient?
  
   My queries look like:
  
 
 /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
  
   For index with 3 mln documents query for all docs with group=true takes
   almost 4000ms. Because queryResultCache is not used next queries take a
   long time also.
  
   When I remove group=true and leave only faceting the query for all docs
   takes much more less time: for first time ~ 700ms and next runs only
  200ms
   because of queryResultCache being used.
  
   So with group=true the query is about 20 time slower than without it.
   Is it possible or is there any way to improve performance with
 grouping?
  
   My application needs grouping feature and all of the queries use it but
  the
   performance of them is to low for production use.
  
   I use Solr 4.x from trunk
  
   Agnieszka Kukalowicz
 





Re: Index version on slave incrementing to higher than master

2012-07-16 Thread Andrew Davidoff
Thanks Erick,

I will look harder at our current configuration and how we're handling
config replication, but I just realized that a backup script was doing a
commit and an optimize on the slave prior to taking the backup. This
happens daily, after updates and replication from the master. This is
something I put in place many ages ago and didn't think to look at until
now :/

Based on the times in the logs and the conditions under which my problem
was occurring (when I wasn't optimizing on the master before initiating
replication) it seems clear that this backup script is my problem. Sorry
for taking your time with something that was clearly my own dang fault. I
appreciate your suggestions and responses regardless!

Andy

On Mon, Jul 16, 2012 at 7:35 AM, Erick Erickson erickerick...@gmail.comwrote:

 Andrew:

 I'm not entirely sure that's your problem, but it's the first thing I'd
 try.

 As for your config files, see the section Replicating solrconfig.xml
 here: http://wiki.apache.org/solr/SolrReplication. That at least
 allows you to centralize separate solrconfigs for master and
 slave, making promoting a slave to a master a bit easier

 Best
 Erick

 On Sun, Jul 15, 2012 at 2:00 PM, Andrew Davidoff david...@qedmf.net
 wrote:
  Erick,
 
  Thank you. I think originally my thought was that if I had my slave
  configuration really close to my master config, it would be very easy to
  promote a slave to a master (and vice versa) if necessary. But I think
 you
  are correct that ripping out from the slave config anything that would
  modify an index in any way makes sense. I will give this a try very soon.
 
  Thanks again.
  Andy
 
 
  On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Gotta admit it's a bit puzzling, and surely you want to move to the 3x
  versions G..
 
  But at a guess, things might be getting confused on the slaves given
  you have a merge policy on them. There's no reason to have any
  policies on the slaves; slaves should just be about copying the files
  from the master, all the policies,commits,optimizes should be done on
  the master. About all the slave does is copy the current state of the
 index
  from the master.
 
  So I'd try removing everything but the replication from the slaves,
  including
  any autocommit stuff and just let replication do it's thing.
 
  And I'd replicate after the optimize if you keep the optimize going. You
  should
  end up with one segment in the index after that, on both the master and
  slave.
  You can't get any more merged than that.
 
  Of course you'll also copy the _entire_ index every time after you've
  optimized...
 
  Best
  Erick
 
  On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff david...@qedmf.net
  wrote:
   Hi,
  
   I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has
 a
   number of solr instances running on it (150 or so), and nightly most
 of
   them have documents written to them. The script that does these writes
   (adds) does a commit and an optimize on the indexes when it's entirely
   finished updating them, then initiates replication on the slave per
   instance. In this configuration, the index versions between master and
   slave remain in synch.
  
   The optimize portion, which, again, happens nightly, is taking a lot
 of
   time and I think it's unnecessary. I was hoping to stop doing this
  explicit
   optimize, and to let my merge policy handle that. However, if I don't
 do
  an
   optimize, and only do a commit before initiating slave replication,
 some
   hours later the slave is, for reasons that are unclear to me,
  incrementing
   its index version to 1 higher than the master.
  
   I am not really sure I understand the logs, but it looks like the
   incremented index version is the result of an optimize on the slave,
 but
  I
   am never issuing any commands against the slave aside from initiating
   replication, and I don't think there's anything in my solr
 configuration
   that would be initiating this. I do have autoCommit on with maxDocs of
   1000, but since I am initiating slave replication after doing a
 commit on
   the master, I don't think there would ever be any uncommitted
 documents
  on
   the slave. I do have a merge policy configured, but it's not clear to
 me
   that it has anything to do with this. And if it did, I'd expect to see
   similar behavior on the master (right?).
  
   I have included a snipped from my slave logs that shows this issue. In
  this
   snipped index version 1286065171264 is what the master has,
   and 1286065171265 is what the slave increments itself to, which is
 then
  out
   of synch with the master in terms of version numbers. Nothing that I
 know
   of is issuing any commands to the slave at this time. If I understand
  these
   logs (I might not), it looks like something issued an optimize that
 took
   1023720ms? Any ideas?
  
   Thanks in advance.
  
   Andy
  
  
  
   Jul 12, 2012 12:21:14 PM 

Re: Grouping performance problem

2012-07-16 Thread alxsss
This is strange. We have data folder size 24Gb,  RAM for java 2GB. We query 
with grouping, ngroups and  highlighting, do not query all fields and query 
time mostly is less than 1 sec it rarely goes up to 2 sec. We use solr 3.6 and 
tuned off all kind of caching.
Maybe your problem is with caching and displaying all fields?

Hope this may help.

Alex.



-Original Message-
From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jul 16, 2012 10:04 am
Subject: Re: Grouping performance problem


I have server with 24GB RAM. I have 4 shards on it, each of them with 4GB
RAM for java:
JAVA_OPTIONS=-server -Xms4096M -Xmx4096M
The size is about 15GB for one shard (i use ssd disk for index data).

Agnieszka


2012/7/16 alx...@aim.com

 What are the RAM of your server and size of the data folder?



 -Original Message-
 From: Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jul 16, 2012 6:16 am
 Subject: Re: Grouping performance problem


 Hi Pavel,

 I tried with group.ngroups=false but didn't notice a big improvement.
 The times were still about 4000 ms. It doesn't solve my problem.
 Maybe this is because of my index type. I have millions of documents but
 only about 20 000 groups.

  Cheers
  Agnieszka

 2012/7/16 Pavel Goncharik pavel.goncha...@gmail.com

  Hi Agnieszka ,
 
  if you don't need number of groups, you can try leaving out
  group.ngroups=true param.
  In this case Solr apparently skips calculating all groups and delivers
  results much faster.
  At least for our application the difference in performance
  with/without group.ngroups=true is significant (have to say, we use
  Solr 3.6).
 
  WBR,
  Pavel
 
  On Mon, Jul 16, 2012 at 1:00 PM, Agnieszka Kukałowicz
  agnieszka.kukalow...@usable.pl wrote:
   Hi,
  
   Is the any way to make grouping searches more efficient?
  
   My queries look like:
  
 
 /select?q=querygroup=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
  
   For index with 3 mln documents query for all docs with group=true takes
   almost 4000ms. Because queryResultCache is not used next queries take a
   long time also.
  
   When I remove group=true and leave only faceting the query for all docs
   takes much more less time: for first time ~ 700ms and next runs only
  200ms
   because of queryResultCache being used.
  
   So with group=true the query is about 20 time slower than without it.
   Is it possible or is there any way to improve performance with
 grouping?
  
   My application needs grouping feature and all of the queries use it but
  the
   performance of them is to low for production use.
  
   I use Solr 4.x from trunk
  
   Agnieszka Kukalowicz
 




 


Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-16 Thread Pawel Rog
Maybe try EdgeNgramFilterFactory
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/#solr.EdgeNGramFilterFactory


On Mon, Jul 16, 2012 at 6:57 AM, santamaria2 aravinda@contify.comwrote:

 I'm about to implement an autocomplete mechanism for my search box. I've
 read
 about some of the common approaches, but I have a question about wildcard
 query vs facet.prefix.

 Say I want autocomplete for a title: 'Shadows of the Damned'. I want this
 to
 appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that
 it won't appear if I type 'hadows'.

 While indexing, I'd use a whitespace tokenizer and a lowercase filter to
 store that title in the index.
 Now I'm thinking two approaches for 'dam' typed in the search box:

 1) q=title:dam*

 2) q=*:*facet=onfacet.field=titlefacet.prefix=dam


 So any reason that I should favour one over the other? Speed a factor? The
 index has around 200,000 items.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Metadata and FullText, indexed at different times - looking for best approach

2012-07-16 Thread Alexandre Rafalovitch
Thank you,

I am already on 4alpha. Patch feels a little too unstable for my
needs/familiarity with the codes.

What about something around multiple cores? Could I have full-text
fields stored in a separate cores and somehow (again, minimum
hand-coding) do search against all those cores and get back combined
list of document IDs? Or would it making comparative ranking/sorting
impossible?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sun, Jul 15, 2012 at 12:08 PM, Erick Erickson
erickerick...@gmail.com wrote:
 You've got a couple of choices. There's a new patch in town
 https://issues.apache.org/jira/browse/SOLR-139
 that allows you to update individual fields in a doc if (and only if)
 all the fields in the original document were stored (actually, all the
 non-copy fields).

 So if you're storing (stored=true) all your metadata information, you can
 just update the document when the  text becomes available assuming you
 know the uniqueKey when you update.

 Under the covers, this will find the old document, get all the fields, add the
 new fields to it, and re-index the whole thing.

 Otherwise, your fallback idea is a good one.

 Best
 Erick

 On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch
 arafa...@gmail.com wrote:
 Hello,

 I have a database of metadata and I can inject it into SOLR with DIH
 just fine. But then, I also have the documents to extract full text
 from that I want to add to the same records as additional fields. I
 think DIH allows to run Tika at the ingestion time, but I may not have
 the full-text files at that point (they could arrive days later). I
 can match the file to the metadata by a file name matching a field
 name.

 What is the best approach to do that staggered indexing with minimum
 custom code? I guess my fallback position is a custom full-text
 indexer agent that re-adds the metadata fields when the file is being
 indexed. Is there anything better?

 I am a newbie using v4.0alpha of SOLR (and loving it).

 Thank you,
 Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


Solr 3.5 DIH delta-import replicating full index or Admin UI problem?

2012-07-16 Thread Arcadius Ahouansou
Hello.

We are running Solr 3.5 multicore in master-slave mode.


-Our delta-import looks like:
/solr/core01/dataimport?command=delta-import*optimize=false*

The size of the index in 1.18GB

When delta-import is going on, on the slave admin UI
 8983/solr/core01/admin/replication/index.jsp
I can see the following output:

Master http://solrmaster01.somedomain.com:8983/solr/core01/replication
Latest Index Version:null, Generation: null
Replicatable Index Version:1342183977587, Generation: 33
Poll Interval 00:00:60
Local Index Index Version: 1342183977585, Generation: 32
Location: /var/somedomain/solr/solrhome/core01/data/index
Size: 1.18 GB
Times Replicated Since Startup: 32
Previous Replication Done At: Mon Jul 16 17:08:58 GMT 2012
Config Files Replicated At: null
Config Files Replicated: null
Times Config Files Replicated Since Startup: null
Next Replication Cycle At: Mon Jul 16 17:09:58 GMT 2012
Current Replication Status Start Time: Mon Jul 16 17:08:58 GMT 2012
Files Downloaded: 12 / 95
*Downloaded: 4.33 KB / 1.18 GB [0.0%]*
*Downloading File: _1o.fdt, Downloaded: 510 bytes / 510 bytes [100.0%]*
Time Elapsed: 22s, Estimated Time Remaining: 6266208s, Speed: 201 bytes/s
-


- Does Downloaded: 4.33 KB / *1.18 GB [0.0%] *means that the solr slave
is going to download the whole 1.18GB?

-I have been monitoring this and the replications takes less that a minute.
And checking the files in the index directory on the slave, the timestamps
are quite different, so apparently, the slave is not downloading the full
index all the time.

-Please, has anyone else seen the whole index size being shown as
denominator of the Downloaded fraction?

-Anything I may be doing wrong?

-Also notice the Files Downloaded: 12 / 95.  That bit never increase
to 95 / 95


Our solrconfig looks like this:

--
requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAftercommit/str
 str name=replicateAfterstartup/str
 str
name=confFilessolrconfig.xml,synonyms.txt,schema.xml,stopwords.txt,data-config.xml/str
/lst
lst name=slave
 str name=enable${enable.slave:false}/str
 str name=masterUrlsome-master-full-url/str
 str name=pollInterval00:00:60/str
/lst
/requestHandler
--


Thanks.

Arcadius.

*
*


Re: Wildcard query vs facet.prefix for autocomplete?

2012-07-16 Thread solrman
term component will be faster.
like below:
http://host:port/solr/terms?terms.fl=contentterms.prefix=sol

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995378.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Lost answers?

2012-07-16 Thread Bruno Mannina

Hello Michael,

I will check the log, but today I think to another thing may be it's my 
program that it losts some requests.

It's the first time where the download is so fast.

With Jetty, it's a little bit slower so may be for this reason my 
program works fine.


Do you think I can use Jetty for my prod' environment?
I will have around 500 users / year with 10 000 requests by day max

Le 16/07/2012 16:40, Michael Della Bitta a écrit :

Hello, Bruno,

No, 4 simultaneous requests should not be a problem.

Have you checked the Tomcat logs or logged the data in the query
response object to see if there are any clues to what the problem
might be?

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina bmann...@free.fr wrote:

I forgot:

I do the request on the uniqueKey field, so each request gets one document

Le 15/07/2012 14:11, Bruno Mannina a écrit :


Dear Solr Users,

I have a solr3.6 + Tomcat and I have a program that connect 4 http
requests at the same time.
I must do 1902 requests.

I do several tests but each time it losts some requests:
- sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

With Jetty, I get always 1902 docs.

As it's a dev' environment, I'm alone to test it.

Is it a problem to do 4 requests at the same time for tomcat6?

thanks for your info,

Bruno











Re: Lost answers?

2012-07-16 Thread Michael Della Bitta
Hello Bruno,

Jetty is a legitimate choice. I do, however, worry that you might be
masking an underlying problem by making that choice, without a
guarantee that it won't someday hurt you even if you use Jetty.

A question: are you using a client to connect to Solr and issue your
queries? Something like SolrJ, solr-php-client, rsolr, etc.? If not,
you might find that someone has already done the work for you of
making a durable client-side API for Solr, and achieve better results.


Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Mon, Jul 16, 2012 at 3:16 PM, Bruno Mannina bmann...@free.fr wrote:
 Hello Michael,

 I will check the log, but today I think to another thing may be it's my
 program that it losts some requests.
 It's the first time where the download is so fast.

 With Jetty, it's a little bit slower so may be for this reason my program
 works fine.

 Do you think I can use Jetty for my prod' environment?
 I will have around 500 users / year with 10 000 requests by day max

 Le 16/07/2012 16:40, Michael Della Bitta a écrit :

 Hello, Bruno,

 No, 4 simultaneous requests should not be a problem.

 Have you checked the Tomcat logs or logged the data in the query
 response object to see if there are any clues to what the problem
 might be?

 Michael Della Bitta

 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com


 On Sun, Jul 15, 2012 at 2:10 PM, Bruno Mannina bmann...@free.fr wrote:

 I forgot:

 I do the request on the uniqueKey field, so each request gets one
 document

 Le 15/07/2012 14:11, Bruno Mannina a écrit :

 Dear Solr Users,

 I have a solr3.6 + Tomcat and I have a program that connect 4 http
 requests at the same time.
 I must do 1902 requests.

 I do several tests but each time it losts some requests:
 - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

 With Jetty, I get always 1902 docs.

 As it's a dev' environment, I'm alone to test it.

 Is it a problem to do 4 requests at the same time for tomcat6?

 thanks for your info,

 Bruno








Re: Query results vs. facets results

2012-07-16 Thread tudor

Erick Erickson wrote
 
 Ahhh, you need to look down another few lines. When you specify fq, there
 should be a section of the debug output like
 arr name=filter_queries
   .
   .
   .
 /arr
 
 where the array is the parsed form of the filter queries. I was thinking
 about
 comparing that with the parsed form of the q parameter in the non-filter
 case to see what insight one could gain from that.
 
 

There is no filter_queries section because I do not use an fq in the first
two queries. I use one in the combined query, for which you can see the
output further below.


Erick Erickson wrote
 
 
 But there's already one difference, when you use *, you get
  str name=parsedqueryID:*/str
 
 Is it possible that you have some documents that do NOT have an ID field?
 try *:* rather than just *. I'm guessing that your default search field is
 ID
 and you have some documents without an ID field. Not a good guess if ID
 is your uniqueKey though..
 
 Try q=*:* -ID:* and see if you get 31 docs.
 
 

All the entries have an ID, so q=*:* -ID:* yielded 0 results.
The ID could appear multiple times, that is the reason behind grouping of
results. Indeed, ID is the default search field.


Erick Erickson wrote
 
 
 Also note that if you _have_ specified ID as your uniqueKey _but_ you
 didn't
 re-index afterwards (actually, I'd blow away the entire
 solrhome/data directory
 and restart) you may have stale data in there that allowed documents to
 exist
 that do not have uniqueKey fields.
 
 

For Solr's unique id I use a fieldType name=uuid class=solr.UUIDField
indexed=true / field (which, of course, has a different name than the
default search ID), so it should not be a problem.

I have re-indexed the data, and I get somewhat a different result. This is
the query:

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*:*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=STR_ENTERPRISE_IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

And the results as well as the debug information:

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int
  arr name=groups/arr
   ...

lst name=facet_counts
  lst name=facet_queries/
lst name=facet_fields
lst name=CITY
  ...
int name=MILTON89/int
  ...

lst name=debug
  str name=rawquerystring*:*/str
  str name=querystring*:*/str
  str name=parsedqueryMatchAllDocsQuery(*:*)/str
  str name=parsedquery_toString*:*/str
  lst name=explain/lst
  str name=QParserLuceneQParser/str
  arr name=filter_queries
  str{!tag=dt}CITY:MILTON/str
  /arrarr name=parsed_filter_queries
  strCITY:MILTON/str
  /arr
  lst name=timing/lst
/lst

So now fq says: 134 groups with CITY:MILTON and faceted search says: 83
groups with CITY:MILTON. 

How can I see some information about the grouping in Solr?

Thanks Erick!

Regards,
Tudor


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995388.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR 4 Alpha Out Of Mem Err

2012-07-16 Thread Nick Koton
 That suggests you're running out of threads
Michael,
Thanks for this useful observation.  What I found just prior to the problem
situation was literally thousands of threads in the server JVM.  I have
pasted a few samples below obtained from the admin GUI.  I spent some time
today using this barometer, but I don't have enough to share right now.  I'm
looking at the difference between ConcurrentUpdateSolrServer and
HttpSolrServer and how my client may be misusing them.  I'll assume my
client is misbehaving and driving the server crazy for now.  If I figure out
how, I will share it so perhaps a safe guard can be put in place.

Nick


Server threads - very roughly 0.1 %:
cmdDistribExecutor-9-thread-7161 (10096) 
java.util.concurrent.SynchronousQueue$TransferStack@17b90c55
.   sun.misc.Unsafe.park(Native Method)
.
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
.
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Synchronous
Queue.java:424)
.
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueu
e.java:323)
.
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:874)
.
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:945)
.
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
07)
.   java.lang.Thread.run(Thread.java:662)
-0.ms
-0.ms cmdDistribExecutor-9-thread-7160 (10086) 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5509b5
6
.   sun.misc.Unsafe.park(Native Method)
.   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
.
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
AbstractQueuedSynchronizer.java:1987)
.
org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByR
oute.java:403)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRou
te.java:300)
.
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(
ThreadSafeClientConnManager.java:224)
.
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDir
ector.java:401)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:820)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:754)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:732)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:351)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:182)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325
)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306
)
.   java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
.   java.util.concurrent.FutureTask.run(FutureTask.java:138)
.
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
.   java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
.   java.util.concurrent.FutureTask.run(FutureTask.java:138)
.
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
.
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
.   java.lang.Thread.run(Thread.java:662)
20.ms
20.ms cmdDistribExecutor-9-thread-7159 (10085) 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6f062d
d3
.   sun.misc.Unsafe.park(Native Method)
.   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
.
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(
AbstractQueuedSynchronizer.java:1987)
.
org.apache.http.impl.conn.tsccm.WaitingThread.await(WaitingThread.java:158)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByR
oute.java:403)
.
org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRou
te.java:300)
.
org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(
ThreadSafeClientConnManager.java:224)
.
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDir
ector.java:401)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:820)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:754)
.
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.ja
va:732)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:351)
.
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java
:182)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:325
)
.
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306
)
.   java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
.   java.util.concurrent.FutureTask.run(FutureTask.java:138)
.

Re: Mmap

2012-07-16 Thread Bill Bell
Any thought on this? Is the default Mmap?



Sent from my mobile device
720-256-8076

On Feb 14, 2012, at 7:16 AM, Bill Bell billnb...@gmail.com wrote:

 Does someone have an example of using unmap in 3.5 and chunksize?
 
 I am using Solr 3.5.
 
 I noticed in solrconfig.xml:
 
 directoryFactory name=DirectoryFactory 
 class=${solr.directoryFactory:solr.StandardDirectoryFactory}/
 
 I don't see this parameter taking.. When I set 
 -Dsolr.directoryFactory=solr.MMapDirectoryFactory
 
 How do I see the setting in the log or in stats.jsp ? I cannot find a place 
 that indicates it is set or not.
 
 I would assume StandardDirectoryFactory is being used but I do see (when I 
 set it or NOT set it)
 
 Bill Bell
 Sent from mobile
 


Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Mou
Hi,

Our index is divided into two shards and each of them has 120M docs , total
size 75G in each core.
The server is a pretty good one , jvm is given memory of 70G and about same
is left for OS (SLES 11) .

We use all dynamic fields except th eunique id and are using long queries
but almost all of them are filter queires, Each query may have 10 -30 fq
parameters.

When I tested the index ( same size) but with max heap size 40 G, queries
were blazing fast. I used solrmeter to load test and it was happily serving
12000 queries or more per min with avg 65 ms qtime.We had an excellent
filtercache hit ratio.

This index is only used for searching and being replicated every 7 sec from
the master.

But now in production server it is horribly slow and taking 5 mins(qtime) to
return a query ( same query).
What could go wrong?

Really appreciate your suggestions on debugging this thing..



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to setup SimpleFSDirectoryFactory

2012-07-16 Thread William Bell
We all know that MMapDirectory is fastest. However we cannot always
use it since you might run out of memory on large indexes right?

Here is how I got iSimpleFSDirectoryFactory to work. Just set
-Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.

Your solrconfig.xml:

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

You can check it with http://localhost:8983/solr/admin/stats.jsp

Notice that the default for Windows 64bit is MMapDirectory. Else
NIOFSDirectory except for WIndows It would be nicer if we just set
it all up with a helper in solrconfig.xml...

if (Constants.WINDOWS) {
 if (MMapDirectory.UNMAP_SUPPORTED  Constants.JRE_IS_64BIT)
return new MMapDirectory(path, lockFactory);
 else
return new SimpleFSDirectory(path, lockFactory);
 } else {
return new NIOFSDirectory(path, lockFactory);
  }
}



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Mmap

2012-07-16 Thread William Bell
Yep.

-Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory

or

-Dsolr.directoryFactory=solr.MMapDirectoryFactory

works great.


On Mon, Jul 16, 2012 at 7:55 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Hi Bill,

 Standard picks one for you. Otherwise, you can hardcode the
 DirectoryFactory in your config file, or I believe if you specify

 -Dsolr.solr.directoryFactory=solr.MMapDirectoryFactory

 That will get you what you want.

 Michael Della Bitta

 
 Appinions, Inc. -- Where Influence Isn’t a Game.
 http://www.appinions.com


 On Mon, Jul 16, 2012 at 9:32 PM, Bill Bell billnb...@gmail.com wrote:
 Any thought on this? Is the default Mmap?



 Sent from my mobile device
 720-256-8076

 On Feb 14, 2012, at 7:16 AM, Bill Bell billnb...@gmail.com wrote:

 Does someone have an example of using unmap in 3.5 and chunksize?

 I am using Solr 3.5.

 I noticed in solrconfig.xml:

 directoryFactory name=DirectoryFactory 
 class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

 I don't see this parameter taking.. When I set 
 -Dsolr.directoryFactory=solr.MMapDirectoryFactory

 How do I see the setting in the log or in stats.jsp ? I cannot find a place 
 that indicates it is set or not.

 I would assume StandardDirectoryFactory is being used but I do see (when I 
 set it or NOT set it)

 Bill Bell
 Sent from mobile




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Mou
Thanks Brian. Excellent suggestion.

I haven't used VisualVM before but I am going to use it to see where CPU is
going. I saw that CPU is overly used. I haven't seen so much CPU use in
testing.
Although I think GC is not a problem, splitting the jvm per shard would be
a good idea.


On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] 
ml-node+s472066n3995446...@n3.nabble.com wrote:

 5 min is ridiculously long for a query that used to take 65ms. That ought
 to be a great clue. The only two things I've seen that could cause that
 are thrashing, or GC. Hard to see how it could be thrashing, given your
 hardware, so I'd initially suspect GC.

 Aim VisualVM at the JVM. It shows how much CPU goes to GC over time, in a
 nice blue line. And if it's not GC, try out its Sampler tab, and see where
 the CPU is spending its time.

 FWIW, when asked at what point one would want to split JVMs and shard, on
 the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC
 cost reasons. You're way above that. Maybe multiple JVMs and sharding,
 even on the same machine, would serve you better than a monster 70GB JVM.

 -- Bryan

  -Original Message-
  From: Mou [mailto:[hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3995446i=0]

  Sent: Monday, July 16, 2012 7:43 PM
  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3995446i=1
  Subject: Using Solr 3.4 running on tomcat7 - very slow search
 
  Hi,
 
  Our index is divided into two shards and each of them has 120M docs ,
  total
  size 75G in each core.
  The server is a pretty good one , jvm is given memory of 70G and about
  same
  is left for OS (SLES 11) .
 
  We use all dynamic fields except th eunique id and are using long
 queries
  but almost all of them are filter queires, Each query may have 10 -30 fq
  parameters.
 
  When I tested the index ( same size) but with max heap size 40 G,
 queries

  were blazing fast. I used solrmeter to load test and it was happily
  serving
  12000 queries or more per min with avg 65 ms qtime.We had an excellent
  filtercache hit ratio.
 
  This index is only used for searching and being replicated every 7 sec
  from
  the master.
 
  But now in production server it is horribly slow and taking 5
 mins(qtime)

  to
  return a query ( same query).
  What could go wrong?
 
  Really appreciate your suggestions on debugging this thing..
 
 
 
  --
  View this message in context: http://lucene.472066.n3.nabble.com/Using-
  Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995446.html
  To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow search, 
 click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg1MTA5MTUw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995449.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-16 Thread Bryan Loofbourrow
Another thing you may wish to ponder is this blog entry from Mike
McCandless:
http://blog.mikemccandless.com/2011/04/just-say-no-to-swapping.html

In it, he discusses the poor interaction between OS swapping, and
long-neglected allocations in a JVM. You're on Linux, which has decent
control over swapping decisions, so you may find that a tweak is in order,
especially if you can discover evidence that the hard drive is being
worked hard during GC. If the problem exists, it might be especially
pronounced in your large JVM.

I have no direct evidence of thrashing during GC (I am not sure how to go
about gathering such evidence), but I have seen, on a Windows machine, a
Tomcat running Solr refuse to shut down for many minutes, while a Resource
Monitor session reports that that same Tomcat process is frantically
reading from the page file the whole time. So there is something besides
plausibility to the idea.

-- Bryan

 -Original Message-
 From: Mou [mailto:mouna...@gmail.com]
 Sent: Monday, July 16, 2012 9:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Using Solr 3.4 running on tomcat7 - very slow search

 Thanks Brian. Excellent suggestion.

 I haven't used VisualVM before but I am going to use it to see where CPU
 is
 going. I saw that CPU is overly used. I haven't seen so much CPU use in
 testing.
 Although I think GC is not a problem, splitting the jvm per shard would
be
 a good idea.


 On Mon, Jul 16, 2012 at 9:44 PM, Bryan Loofbourrow [via Lucene] 
 ml-node+s472066n3995446...@n3.nabble.com wrote:

  5 min is ridiculously long for a query that used to take 65ms. That
 ought
  to be a great clue. The only two things I've seen that could cause
that
  are thrashing, or GC. Hard to see how it could be thrashing, given
your
  hardware, so I'd initially suspect GC.
 
  Aim VisualVM at the JVM. It shows how much CPU goes to GC over time,
in
 a
  nice blue line. And if it's not GC, try out its Sampler tab, and see
 where
  the CPU is spending its time.
 
  FWIW, when asked at what point one would want to split JVMs and shard,
 on
  the same machine, Grant Ingersoll mentioned 16GB, and precisely for GC
  cost reasons. You're way above that. Maybe multiple JVMs and sharding,
  even on the same machine, would serve you better than a monster 70GB
 JVM.
 
  -- Bryan
 
   -Original Message-
   From: Mou [mailto:[hidden
 email]http://user/SendEmail.jtp?type=nodenode=3995446i=0]
 
   Sent: Monday, July 16, 2012 7:43 PM
   To: [hidden
 email]http://user/SendEmail.jtp?type=nodenode=3995446i=1
   Subject: Using Solr 3.4 running on tomcat7 - very slow search
  
   Hi,
  
   Our index is divided into two shards and each of them has 120M docs
,
   total
   size 75G in each core.
   The server is a pretty good one , jvm is given memory of 70G and
about
   same
   is left for OS (SLES 11) .
  
   We use all dynamic fields except th eunique id and are using long
  queries
   but almost all of them are filter queires, Each query may have 10
-30
 fq
   parameters.
  
   When I tested the index ( same size) but with max heap size 40 G,
  queries
 
   were blazing fast. I used solrmeter to load test and it was happily
   serving
   12000 queries or more per min with avg 65 ms qtime.We had an
excellent
   filtercache hit ratio.
  
   This index is only used for searching and being replicated every 7
sec
   from
   the master.
  
   But now in production server it is horribly slow and taking 5
  mins(qtime)
 
   to
   return a query ( same query).
   What could go wrong?
  
   Really appreciate your suggestions on debugging this thing..
  
  
  
   --
   View this message in context:
 http://lucene.472066.n3.nabble.com/Using-
   Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436.html
   Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  --
   If you reply to this email, your message will be added to the
 discussion
  below:
 
  http://lucene.472066.n3.nabble.com/Using-Solr-3-4-running-on-tomcat7-
 very-slow-search-tp3995436p3995446.html
   To unsubscribe from Using Solr 3.4 running on tomcat7 - very slow
 search, click
 

herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=uns

ubscribe_by_codenode=3995436code=bW91bmFuZGlAZ21haWwuY29tfDM5OTU0MzZ8Mjg
 1MTA5MTUw
  .
 

NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=mac

ro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespace
 s.BasicNamespace-nabble.view.web.template.NabbleNamespace-

nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21na
 bble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-
 send_instant_email%21nabble%3Aemail.naml
 


 --
 View this message in context: http://lucene.472066.n3.nabble.com/Using-
 Solr-3-4-running-on-tomcat7-very-slow-search-tp3995436p3995449.html
 Sent from the Solr - User mailing list archive at Nabble.com.