date:20120502

Simply turn off replication during your rebuild-from-scratch. See:
http://wiki.apache.org/solr/SolrReplication#HTTP_API
the disabelreplication command.

The autocommit thing was, I think, in reference to keeping
any replication of a partial-rebuild from being replicated.
Autocommit is usually a fine thing.

So your full-rebuild looks like this
1 disable replication on the master
2 rebuild the index (autocommit on or off, makes little difference as
far as replication)
3 enable replication on the master

Best
Erick

On Tue, May 1, 2012 at 8:55 AM, geeky2 gee...@hotmail.com wrote:
hello shawn,

thanks for the reply.

ok - i did some testing and yes you are correct.

autocommit is doing the commit work in chunks. yes - the slaves are also
going to having everything to nothing, then slowly building back up again,
lagging behind the master.

... and yes - this is probably not what we need - as far as a replication
strategy for the slaves.

you said, you don't use autocommit. if so - then why don't you use / like
autocommit?

since we have not done this here - there is no established reference point,
from an operations perspective.

i am looking to formulate some sort of operation strategy, so ANY ideas or
input is really welcome.

it seems to me that we have to account for two operational strategies -

the first operational mode is a daily append to the solr core after the
database tables have been updated. this can probably be done with a simple
delta import. i would think that autocommit could remain on for the master
and replication could also be left on so the slaves picked up the changes
ASAP. this seems like the mode that we would / should be in most of the
time.

the second operational mode would be a build from scratch mode, where
changes in the schema necessitated a full re-index of the data. given that
our site (powered by solr) must be up all of the time, and that our full
index time on the master (for the moment) is hovering somewhere around 16
hours - it makes sense that some sort of parallel path - with a cut-over,
must be used.

in this situation is it possible to have the indexing process going on in
the background - then have one commit at the end - then turn replication on
for the slaves?

are there disadvantages to this approach?

also - i really like your suggestion of a build core and live core. is
this approach you use?

thank you for all of the great input

then

--
View this message in context:
http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3952904.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Merge during off peak times

Why do you care? Merging is generally a background process, or are
you doing heavy indexing? In a master/slave setup,
it's usually not really relevant except that (with 3.x), massive merges
may temporarily stop indexing. Is that the problem?

Look at the merge policys, there are configurations that make
this less painful.

In trunk, DocumentWriterPerThread makes merges happen in the
background, which helps the long-pause-while-indexing problem.

Best
Erick

On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
prabhu.prakashgan...@dowjones.com wrote:
 Ok, thanks Otis
 Another question on merging
 What is the best way to monitor merging?
 Is there something in the log file that I can look for?
 It seems like I have to monitor the system resources - read/write IOPS etc.. 
 and work out when a merge happened
 It would be great if I can do it by looking at log files or in the admin UI. 
 Do you know if this can be done or if there is some tool for this?

 Thanks
 Prabhu

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: 01 May 2012 15:12
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Hi Prabhu,

 I don't think such a merge policy exists, but it would be nice to have this 
 option and I imagine it wouldn't be hard to write if you really just base the 
 merge or no merge decision on the time of day (and maybe day of the week).

 Note that this should go into Lucene, not Solr, so if you decide to 
 contribute your work, please 
 see http://wiki.apache.org/lucene-java/HowToContribute

 Otis
 
 Performance Monitoring for Solr - http://sematext.com/spm





 From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, May 1, 2012 8:45 AM
Subject: Solr Merge during off peak times

Hi,
  I would like to know if there is a way to configure index merge policy in 
solr so that the merging happens during off peak hours. Can you please let me 
know if such a merge policy configuration exists?

Thanks
Prabhu

Re: Lucene FieldCache - Out of memory exception

The FieldCache gets populated the first time a given field is referenced as 
a facet and then will stay around forever. So, as additional queries get 
executed with different facet fields, the number of FieldCache entries will 
grow.


If I understand what you have said, theses faceted queries do work 
initially, but after awhile they stop working with OOM, correct?


The size of a single FieldCache depends on the field type. Since you are 
using dynamic fields, it depends on your dynamicField types - which you 
have not told us about. From your query I see that your fields start with 
S_ and F_ - presumably you have dynamic field types S_* and F_*? Are 
they strings, integers, floats, or what?


Each FieldCache will be an array with maxdoc entries (your total number of 
documents - 1.4 million) times the size of the field value or whatever a 
string reference is in your JVM.


String fields will take more space than numeric fields for the FieldCache, 
since a separate table is maintained for the unique terms in that field. 
Roughly what is the typical or average length of one of your facet field 
values? And, on average, how many unique terms are there within a typical 
faceted field?


If you can convert many of these faceted fields to simple integers the size 
should go down dramatically, but that depends on your application.


3 GB sounds like it might not be enough for such heavy use of faceting. It 
is probably not the 50-70 number, but the 440 or accumulated number across 
many queries that pushes the memory usage up.


When you hit OOM, what does the Solr admin stats display say for FieldCache?

-- Jack Krupansky

-Original Message- 
From: Rahul R

Sent: Wednesday, May 02, 2012 2:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Lucene FieldCache - Out of memory exception

Here is one sample query that I picked up from the log file :

q=*%3A*fq=Category%3A%223__107%22fq=S_P1540477699%3A%22MICROCIRCUIT%2C+LINE+TRANSCEIVERS%22rows=0facet=truefacet.mincount=1facet.limit=2facet.field=S_C1503120369facet.field=S_P1406389942facet.field=S_P1430116878facet.field=S_P1430116881facet.field=S_P1406453552facet.field=S_P1406451296facet.field=S_P1406452465facet.field=S_C2968809156facet.field=S_P1406389980facet.field=S_P1540477699facet.field=S_P1406389982facet.field=S_P1406389984facet.field=S_P1406451284facet.field=S_P1406389926facet.field=S_P1424886581facet.field=S_P2017662632facet.field=F_P1946367021facet.field=S_P1430116884facet.field=S_P2017662620facet.field=F_P1406451304facet.field=F_P1406451306facet.field=F_P1406451308facet.field=S_P1500901421facet.field=S_P1507138990facet.field=I_P1406452433facet.field=I_P1406453565facet.field=I_P1406452463facet.field=I_P1406453573facet.field=I_P1406451324facet.field=I_P1406451288facet.field=S_P1406451282facet.field=S_P1406452471facet.field=S_P14248866
05facet.field=S_P1946367015facet.field=S_P1424886598facet.field=S_P1946367018facet.field=S_P1406453556facet.field=S_P1406389932facet.field=S_P2017662623facet.field=S_P1406450978facet.field=F_P1406452455facet.field=S_P1406389972facet.field=S_P1406389974facet.field=S_P1406389986facet.field=F_P1946367027facet.field=F_P1406451294facet.field=F_P1406451286facet.field=F_P1406451328facet.field=S_P1424886593facet.field=S_P1406453567facet.field=S_P2017662629facet.field=S_P1406453571facet.field=F_P1946367030facet.field=S_P1406453569facet.field=S_P2017662626facet.field=S_P1406389978facet.field=F_P1946367024

My primary question here is, can Solr handle this kind of queries with so
many facet fields. I have tried using both enum and fc for facet.method and
there is no improvement with either.

Appreciate any help on this. Thank you.

- Rahul


On Mon, Apr 30, 2012 at 2:53 PM, Rahul R rahul.s...@gmail.com wrote:


Hello,
I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application
server on Solaris. I use embedded solr server. More details :
Number of docs in solr index : 1.4 million
Physical size of index : 640MB
Total number of fields in the index : 700 (99% of these are dynamic 
fields)

Total number of fields enabled for faceting : 440
Avg number of facet fields participating in a faceted query : 50-70
Total RAM allocated to weblogic appserver : 3GB (max possible)

In a multi user environment with 3 users using this application for a
period of around 40 minutes, the application runs out of memory. Analysis
of the heap dump shows that almost 85% of the memory is retained by the
FieldCache. Now I understand that the field cache is out of our control 
but

would appreciate some suggestions on how to handle this issue.

Some questions on this front :
- some mail threads on this forum seem to indicate that there could be
some connection between having dynamic fields and usage of FieldCache. Is
this true ? Most of the fields in my index are dynamic fields.
- as mentioned above, most of my faceted queries could have around 50-70
facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
per query). Could this

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu

We have a fairly large scale system - about 200 million docs and fairly high 
indexing activity - about 300k docs per day with peak ingestion rates of about 
20 docs per sec. I want to work out what a good mergeFactor setting would be by 
testing with different mergeFactor settings. I think the default of 10 might be 
high, I want to try with 5 and compare. Unless I know when a merge starts and 
finishes, it would be quite difficult to work out the impact of changing 
mergeFactor. I want to be able to measure how long merges take, run queries 
during the merge activity and see what the response times are etc..

Thanks
Prabhu

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 02 May 2012 12:40
To: solr-user@lucene.apache.org
Subject: Re: Solr Merge during off peak times

Why do you care? Merging is generally a background process, or are
you doing heavy indexing? In a master/slave setup,
it's usually not really relevant except that (with 3.x), massive merges
may temporarily stop indexing. Is that the problem?

Look at the merge policys, there are configurations that make
this less painful.

In trunk, DocumentWriterPerThread makes merges happen in the
background, which helps the long-pause-while-indexing problem.

Best
Erick

On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
prabhu.prakashgan...@dowjones.com wrote:
 Ok, thanks Otis
 Another question on merging
 What is the best way to monitor merging?
 Is there something in the log file that I can look for?
 It seems like I have to monitor the system resources - read/write IOPS etc.. 
 and work out when a merge happened
 It would be great if I can do it by looking at log files or in the admin UI. 
 Do you know if this can be done or if there is some tool for this?

 Thanks
 Prabhu

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: 01 May 2012 15:12
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Hi Prabhu,

 I don't think such a merge policy exists, but it would be nice to have this 
 option and I imagine it wouldn't be hard to write if you really just base the 
 merge or no merge decision on the time of day (and maybe day of the week).

 Note that this should go into Lucene, not Solr, so if you decide to 
 contribute your work, please 
 see http://wiki.apache.org/lucene-java/HowToContribute

 Otis
 
 Performance Monitoring for Solr - http://sematext.com/spm





 From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, May 1, 2012 8:45 AM
Subject: Solr Merge during off peak times

Hi,
  I would like to know if there is a way to configure index merge policy in 
solr so that the merging happens during off peak hours. Can you please let me 
know if such a merge policy configuration exists?

Thanks
Prabhu

Re: Solr Merge during off peak times

But again, with a master/slave setup merging should
be relatively benign. And at 200M docs, having a M/S
setup is probably indicated.

Here's a good writeup of mergepolicy
http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

If you're indexing and searching on a single machine, merging
is much less important than how often you commit. If a M/S
situation, then you're polling interval on the slave is important.

I'd look at commit frequency long before I worried about merging,
that's usually where people shoot themselves in the foot - by
committing too often.

Overall, your mergeFactor is probably less important than other
parts of how you perform indexing/searching, but it does have
some effect for sure...

Best
Erick

On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
prabhu.prakashgan...@dowjones.com wrote:
 We have a fairly large scale system - about 200 million docs and fairly high 
 indexing activity - about 300k docs per day with peak ingestion rates of 
 about 20 docs per sec. I want to work out what a good mergeFactor setting 
 would be by testing with different mergeFactor settings. I think the default 
 of 10 might be high, I want to try with 5 and compare. Unless I know when a 
 merge starts and finishes, it would be quite difficult to work out the impact 
 of changing mergeFactor. I want to be able to measure how long merges take, 
 run queries during the merge activity and see what the response times are 
 etc..

 Thanks
 Prabhu

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 02 May 2012 12:40
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Why do you care? Merging is generally a background process, or are
 you doing heavy indexing? In a master/slave setup,
 it's usually not really relevant except that (with 3.x), massive merges
 may temporarily stop indexing. Is that the problem?

 Look at the merge policys, there are configurations that make
 this less painful.

 In trunk, DocumentWriterPerThread makes merges happen in the
 background, which helps the long-pause-while-indexing problem.

 Best
 Erick

 On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
 prabhu.prakashgan...@dowjones.com wrote:
 Ok, thanks Otis
 Another question on merging
 What is the best way to monitor merging?
 Is there something in the log file that I can look for?
 It seems like I have to monitor the system resources - read/write IOPS etc.. 
 and work out when a merge happened
 It would be great if I can do it by looking at log files or in the admin UI. 
 Do you know if this can be done or if there is some tool for this?

 Thanks
 Prabhu

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: 01 May 2012 15:12
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Hi Prabhu,

 I don't think such a merge policy exists, but it would be nice to have this 
 option and I imagine it wouldn't be hard to write if you really just base 
 the merge or no merge decision on the time of day (and maybe day of the 
 week).

 Note that this should go into Lucene, not Solr, so if you decide to 
 contribute your work, please 
 see http://wiki.apache.org/lucene-java/HowToContribute

 Otis
 
 Performance Monitoring for Solr - http://sematext.com/spm





 From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, May 1, 2012 8:45 AM
Subject: Solr Merge during off peak times

Hi,
  I would like to know if there is a way to configure index merge policy in 
solr so that the merging happens during off peak hours. Can you please let 
me know if such a merge policy configuration exists?

Thanks
Prabhu

Null Pointer Exception in SOLR

2012-05-02 Thread mechravi25

Hi,


When I tried to remove a data from UI (which will in turn hit SOLR), the
whole application got stuck up. When we took the log files of the UI, we
could see that this set of requests did not reach SOLR itself. In the SOLR
log file, we were able to find the following exception occuring at the same
time.


SEVERE: org.apache.solr.common.SolrException:
null__javalangNullPointerException_

null__javalangNullPointerException_

request: http://solr/coreX/select   
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request
at org.apache.solr.handler.component.HttpCommComponent$1.call
at org.apache.solr.handler.component.HttpCommComponent$1.call
at java.util.concurrent.FutureTask$Sync.innerRun
at java.util.concurrent.FutureTask.run
at java.util.concurrent.Executors$RunnableAdapter.call
at java.util.concurrent.FutureTask$Sync.innerRun
at java.util.concurrent.FutureTask.run
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
at java.util.concurrent.ThreadPoolExecutor$Worker.run
at java.lang.Thread.run


This situation resulted for another few hours. No one was able to perform
any operation with the application and If any one tried to perform any
action, it resulted in the above exception during that period. But, this
situation resolved by itself after few hours and it started working like
normal. Can you tell me if this situation was due to deadlock condition or
was it due to the CPU utilization going beyond 100%? If it was due to the
deadloack, then why did we not get any such messages in the log files?Or is
it due to some other problem?Am I missing anything? Can you guide me on
this?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Null-Pointer-Exception-in-SOLR-tp3954952.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie question on sorting

2012-05-02 Thread Jacek

Erick, I'll do that. Thank you very much.

Regards,
Jacek

On Tue, May 1, 2012 at 7:19 AM, Erick Erickson erickerick...@gmail.comwrote:

 The easiest way is to do that in the app. That is, return the top
 10 to the app (by score) then re-order them there. There's nothing
 in Solr that I know of that does what you want out of the box.

 Best
 Erick

 On Mon, Apr 30, 2012 at 11:10 AM, Jacek pjac...@gmail.com wrote:
  Hello all,
 
  I'm facing this simple problem, yet impossible to resolve for me (I'm a
  newbie in Solr).
  I need to sort the results by score (it is simple, of course), but then
  what I need is to take top 10 results, and re-order it (only those top 10
  results) by a date field.
  It's not the same as sort=score,creationdate
 
  Any suggestions will be greatly appreciated!

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu

Actually we are not thinking of a M/S setup
We are planning to have x number of shards on N number of servers, each of the 
shard handling both indexing and searching
The expected query volume is not that high, so don't think we would need to 
replicate to slaves. We think each shard will be able to handle its share of 
the indexing and searching. If we need to scale query capacity in future, yeah 
probably need to do it by replicating each shard to its slaves

I agree autoCommit settings would be good to set up appropriately

Another question I had is pros/cons of optimising the index. We would be 
purging old content every week and am thinking whether to run an index optimise 
in the weekend after purging old data. Because we are going to be continuously 
indexing data which would be mix of adds, updates, deletes, not sure if the 
benefit of optimising would last long enough to be worth doing it. Maybe 
setting a low mergeFactor would be good enough. Optimising makes sense if the 
index is more static, perhaps? Thoughts?

Thanks
Prabhu 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 02 May 2012 13:15
To: solr-user@lucene.apache.org
Subject: Re: Solr Merge during off peak times

But again, with a master/slave setup merging should
be relatively benign. And at 200M docs, having a M/S
setup is probably indicated.

Here's a good writeup of mergepolicy
http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

If you're indexing and searching on a single machine, merging
is much less important than how often you commit. If a M/S
situation, then you're polling interval on the slave is important.

I'd look at commit frequency long before I worried about merging,
that's usually where people shoot themselves in the foot - by
committing too often.

Overall, your mergeFactor is probably less important than other
parts of how you perform indexing/searching, but it does have
some effect for sure...

Best
Erick

On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
prabhu.prakashgan...@dowjones.com wrote:
 We have a fairly large scale system - about 200 million docs and fairly high 
 indexing activity - about 300k docs per day with peak ingestion rates of 
 about 20 docs per sec. I want to work out what a good mergeFactor setting 
 would be by testing with different mergeFactor settings. I think the default 
 of 10 might be high, I want to try with 5 and compare. Unless I know when a 
 merge starts and finishes, it would be quite difficult to work out the impact 
 of changing mergeFactor. I want to be able to measure how long merges take, 
 run queries during the merge activity and see what the response times are 
 etc..

 Thanks
 Prabhu

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 02 May 2012 12:40
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Why do you care? Merging is generally a background process, or are
 you doing heavy indexing? In a master/slave setup,
 it's usually not really relevant except that (with 3.x), massive merges
 may temporarily stop indexing. Is that the problem?

 Look at the merge policys, there are configurations that make
 this less painful.

 In trunk, DocumentWriterPerThread makes merges happen in the
 background, which helps the long-pause-while-indexing problem.

 Best
 Erick

 On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
 prabhu.prakashgan...@dowjones.com wrote:
 Ok, thanks Otis
 Another question on merging
 What is the best way to monitor merging?
 Is there something in the log file that I can look for?
 It seems like I have to monitor the system resources - read/write IOPS etc.. 
 and work out when a merge happened
 It would be great if I can do it by looking at log files or in the admin UI. 
 Do you know if this can be done or if there is some tool for this?

 Thanks
 Prabhu

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: 01 May 2012 15:12
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Hi Prabhu,

 I don't think such a merge policy exists, but it would be nice to have this 
 option and I imagine it wouldn't be hard to write if you really just base 
 the merge or no merge decision on the time of day (and maybe day of the 
 week).

 Note that this should go into Lucene, not Solr, so if you decide to 
 contribute your work, please 
 see http://wiki.apache.org/lucene-java/HowToContribute

 Otis
 
 Performance Monitoring for Solr - http://sematext.com/spm





 From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tuesday, May 1, 2012 8:45 AM
Subject: Solr Merge during off peak times

Hi,
  I would like to know if there is a way to configure index merge policy in 
solr so that the merging happens during off peak hours. Can you please let 
me know if such a

ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty

Greetings Solr folk,

How can I instruct the extract request handler to ignore metadata/headers
etc. when it constructs the content of the document I send to it?

For example, I created an MS Word document containing just the word
SEARCHWORD and nothing else. However, when I ship this doc to my solr
server, here's what's thrown in the index:

str name=meta
Last-Printed 2009-02-05T15:02:00Z Revision-Number 22 Comments
stream_source_info myfile Last-Author Inigo Montoya Template Normal.dotm
Page-Count 1 subject Application-Name Microsoft Macintosh Word Author Jesus
Baggins Word-Count 2 xmpTPg:NPages 1 Edit-Time 1086 Creation-Date
2008-11-05T20:19:00Z stream_content_type application/octet-stream Character
Count 14 stream_size 31232 stream_name /Applications/MAMP/tmp/php/phpHCIg7y
Company Parkman Elastomers Pvt Ltd Content-Type application/msword Keywords
Last-Save-Date 2012-05-01T18:55:00Z SEARCHWORD
/str

All I want is the body of the document, in this case the word SEARCHWORD.

For further reference, here's my extraction handler:

 requestHandler name=/update/extract
  startup=lazy
  class=solr.extraction.ExtractingRequestHandler 
lst name=defaults
  !-- All the main content goes into text... if you need to return
   the extracted text or do highlighting, use a stored field. --
  str name=fmap.contentmeta/str
  str name=lowernamestrue/str
  str name=uprefixignored_/str
/lst
  /requestHandler

(Ironically, meta is the field in the solr schema to which I'm attempting
to extract the body of the document. Don't ask).

Thanks in advance for any pointers you can provide me.

-- 
- Joe

Re: Solr Merge during off peak times

Optimizing is much less important query-speed wise
than historically, essentially it's not recommended much
any more.

A significant effect of optimize _used_ to be purging
obsolete data (i.e. that from deleted docs) from the
index, but that is now done on merge.

There's no harm in optimizing on off-peak hours, and
combined with an appropriate merge policy that may make
indexing a little better (I'm thinking of not doing
as many massive merges here).

BTW, in 4.0, there's DocumentWriterPerThread that
merges in the background and pretty much removes
even this as a motivation for optimizing.

All that said, optimizing isn't _bad_, it's just often
unnecessary.

Best
Erick

On Wed, May 2, 2012 at 9:29 AM, Prakashganesh, Prabhu
prabhu.prakashgan...@dowjones.com wrote:
 Actually we are not thinking of a M/S setup
 We are planning to have x number of shards on N number of servers, each of 
 the shard handling both indexing and searching
 The expected query volume is not that high, so don't think we would need to 
 replicate to slaves. We think each shard will be able to handle its share of 
 the indexing and searching. If we need to scale query capacity in future, 
 yeah probably need to do it by replicating each shard to its slaves

 I agree autoCommit settings would be good to set up appropriately

 Another question I had is pros/cons of optimising the index. We would be 
 purging old content every week and am thinking whether to run an index 
 optimise in the weekend after purging old data. Because we are going to be 
 continuously indexing data which would be mix of adds, updates, deletes, not 
 sure if the benefit of optimising would last long enough to be worth doing 
 it. Maybe setting a low mergeFactor would be good enough. Optimising makes 
 sense if the index is more static, perhaps? Thoughts?

 Thanks
 Prabhu


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 02 May 2012 13:15
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 But again, with a master/slave setup merging should
 be relatively benign. And at 200M docs, having a M/S
 setup is probably indicated.

 Here's a good writeup of mergepolicy
 http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

 If you're indexing and searching on a single machine, merging
 is much less important than how often you commit. If a M/S
 situation, then you're polling interval on the slave is important.

 I'd look at commit frequency long before I worried about merging,
 that's usually where people shoot themselves in the foot - by
 committing too often.

 Overall, your mergeFactor is probably less important than other
 parts of how you perform indexing/searching, but it does have
 some effect for sure...

 Best
 Erick

 On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
 prabhu.prakashgan...@dowjones.com wrote:
 We have a fairly large scale system - about 200 million docs and fairly high 
 indexing activity - about 300k docs per day with peak ingestion rates of 
 about 20 docs per sec. I want to work out what a good mergeFactor setting 
 would be by testing with different mergeFactor settings. I think the default 
 of 10 might be high, I want to try with 5 and compare. Unless I know when a 
 merge starts and finishes, it would be quite difficult to work out the 
 impact of changing mergeFactor. I want to be able to measure how long merges 
 take, run queries during the merge activity and see what the response times 
 are etc..

 Thanks
 Prabhu

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 02 May 2012 12:40
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Why do you care? Merging is generally a background process, or are
 you doing heavy indexing? In a master/slave setup,
 it's usually not really relevant except that (with 3.x), massive merges
 may temporarily stop indexing. Is that the problem?

 Look at the merge policys, there are configurations that make
 this less painful.

 In trunk, DocumentWriterPerThread makes merges happen in the
 background, which helps the long-pause-while-indexing problem.

 Best
 Erick

 On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
 prabhu.prakashgan...@dowjones.com wrote:
 Ok, thanks Otis
 Another question on merging
 What is the best way to monitor merging?
 Is there something in the log file that I can look for?
 It seems like I have to monitor the system resources - read/write IOPS 
 etc.. and work out when a merge happened
 It would be great if I can do it by looking at log files or in the admin 
 UI. Do you know if this can be done or if there is some tool for this?

 Thanks
 Prabhu

 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: 01 May 2012 15:12
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Merge during off peak times

 Hi Prabhu,

 I don't think such a merge policy exists, but it would be nice to

Dumb question: Streaming collector /query results

I doubt if SOLR has this capability , given that it is based on a RESTful
architecture, but I wanted to ask in case I'm mistaken.

In lucene, it is easier to gain a direct handle to the collector / scorer
and access all the results as they're collected (as opposed to the SOLR
query call that performs the same internally but returns only a subset of
results based on the spec'd number of results and offset from the first
result)

What are my options if I want to access results as they're generated?
My first thought would be to write a custom collector to handle the hits as
they're scored.

Thanks








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dumb-question-Streaming-collector-query-results-tp3955175.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dumb question: Streaming collector /query results

In other words, .. as an alternative , what's the most efficient way to gain
access to all of the document ids that match a query

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dumb-question-Streaming-collector-query-results-tp3955175p3955194.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ExtractRH: How to strip metadata

Check to see if you have a CopyField for a wildcard pattern that copies to 
meta, which would copy all of the Tika-generated fields to meta.


-- Jack Krupansky

-Original Message- 
From: Joseph Hagerty

Sent: Wednesday, May 02, 2012 9:56 AM
To: solr-user@lucene.apache.org
Subject: ExtractRH: How to strip metadata

Greetings Solr folk,

How can I instruct the extract request handler to ignore metadata/headers
etc. when it constructs the content of the document I send to it?

For example, I created an MS Word document containing just the word
SEARCHWORD and nothing else. However, when I ship this doc to my solr
server, here's what's thrown in the index:

str name=meta
Last-Printed 2009-02-05T15:02:00Z Revision-Number 22 Comments
stream_source_info myfile Last-Author Inigo Montoya Template Normal.dotm
Page-Count 1 subject Application-Name Microsoft Macintosh Word Author Jesus
Baggins Word-Count 2 xmpTPg:NPages 1 Edit-Time 1086 Creation-Date
2008-11-05T20:19:00Z stream_content_type application/octet-stream Character
Count 14 stream_size 31232 stream_name /Applications/MAMP/tmp/php/phpHCIg7y
Company Parkman Elastomers Pvt Ltd Content-Type application/msword Keywords
Last-Save-Date 2012-05-01T18:55:00Z SEARCHWORD
/str

All I want is the body of the document, in this case the word SEARCHWORD.

For further reference, here's my extraction handler:

requestHandler name=/update/extract
 startup=lazy
 class=solr.extraction.ExtractingRequestHandler 
   lst name=defaults
 !-- All the main content goes into text... if you need to return
  the extracted text or do highlighting, use a stored field. --
 str name=fmap.contentmeta/str
 str name=lowernamestrue/str
 str name=uprefixignored_/str
   /lst
 /requestHandler

(Ironically, meta is the field in the solr schema to which I'm attempting
to extract the body of the document. Don't ask).

Thanks in advance for any pointers you can provide me.

--
- Joe

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty

I do not. I commented out all of the copyFields provided in the default
schema.xml that ships with 3.5. My schema is rather minimal. Here is my
fields block, if this helps:

 fields
   field name=cust type=stringindexed=true  stored=true
 required=true  /
   field name=assettype=stringindexed=true  stored=true
 required=true  /
   field name=ent  type=stringindexed=true  stored=true
 required=true  /
   field name=meta type=text_en   indexed=true  stored=true
 required=true  /
   dynamicField name=ignored_* type=ignored multiValued=true/
   !--field name=modified  type=dateTime  indexed=true
 stored=true  required=false /--
 /fields


On Wed, May 2, 2012 at 10:59 AM, Jack Krupansky j...@basetechnology.comwrote:

 Check to see if you have a CopyField for a wildcard pattern that copies to
 meta, which would copy all of the Tika-generated fields to meta.

 -- Jack Krupansky

 -Original Message- From: Joseph Hagerty
 Sent: Wednesday, May 02, 2012 9:56 AM
 To: solr-user@lucene.apache.org
 Subject: ExtractRH: How to strip metadata


 Greetings Solr folk,

 How can I instruct the extract request handler to ignore metadata/headers
 etc. when it constructs the content of the document I send to it?

 For example, I created an MS Word document containing just the word
 SEARCHWORD and nothing else. However, when I ship this doc to my solr
 server, here's what's thrown in the index:

 str name=meta
 Last-Printed 2009-02-05T15:02:00Z Revision-Number 22 Comments
 stream_source_info myfile Last-Author Inigo Montoya Template Normal.dotm
 Page-Count 1 subject Application-Name Microsoft Macintosh Word Author Jesus
 Baggins Word-Count 2 xmpTPg:NPages 1 Edit-Time 1086 Creation-Date
 2008-11-05T20:19:00Z stream_content_type application/octet-stream Character
 Count 14 stream_size 31232 stream_name /Applications/MAMP/tmp/php/**
 phpHCIg7y
 Company Parkman Elastomers Pvt Ltd Content-Type application/msword Keywords
 Last-Save-Date 2012-05-01T18:55:00Z SEARCHWORD
 /str

 All I want is the body of the document, in this case the word SEARCHWORD.

 For further reference, here's my extraction handler:

 requestHandler name=/update/extract
 startup=lazy
 class=solr.extraction.**ExtractingRequestHandler 
   lst name=defaults
 !-- All the main content goes into text... if you need to return
  the extracted text or do highlighting, use a stored field. --
 str name=fmap.contentmeta/str
 str name=lowernamestrue/str
 str name=uprefixignored_/str
   /lst
  /requestHandler

 (Ironically, meta is the field in the solr schema to which I'm attempting
 to extract the body of the document. Don't ask).

 Thanks in advance for any pointers you can provide me.

 --
 - Joe




-- 
- Joe

question about dates

2012-05-02 Thread G.Long


Hi :)

I'm starting to use Solr and I'm facing a little problem with dates. My 
documents have a date property which is of type 'MMdd'.


To index these dates, I use the following code:

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
doc.addField(date, date);

In the index, the date 20101230 is saved as 2010-12-29T23:00:00Z ( 
because of GMT).


Now I would like to query documents which have their date property 
equals to 20101230 but I don't know how to handle this.


I tried the following code :

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
SimpleDateFormat gmtSdf = new 
SimpleDateFormat(-MM-dd'T'HH\\:mm\\:ss'Z');

String gmtString = gmtSdf.format(Date);

The problem is that gmtString is equals to 2010-12-30T00\:00\:00Z. 
There is a difference between the index value and the parameter value of 
my query : /.


I see that there might be something to do with the timezones during the 
date to string  and string to date conversion but I can't find it.


Thanks,

Gary

Re: question about dates

The trailing Z is required in your input data to be indexed, but the Z is 
not actually stored. Your query must have the trailing Z though, unless 
you are doing a wildcard or prefix query.


-- Jack Krupansky

-Original Message- 
From: G.Long

Sent: Wednesday, May 02, 2012 11:18 AM
To: solr-user@lucene.apache.org
Subject: question about dates

Hi :)

I'm starting to use Solr and I'm facing a little problem with dates. My
documents have a date property which is of type 'MMdd'.

To index these dates, I use the following code:

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
doc.addField(date, date);

In the index, the date 20101230 is saved as 2010-12-29T23:00:00Z (
because of GMT).

Now I would like to query documents which have their date property
equals to 20101230 but I don't know how to handle this.

I tried the following code :

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
SimpleDateFormat gmtSdf = new
SimpleDateFormat(-MM-dd'T'HH\\:mm\\:ss'Z');
String gmtString = gmtSdf.format(Date);

The problem is that gmtString is equals to 2010-12-30T00\:00\:00Z.
There is a difference between the index value and the parameter value of
my query : /.

I see that there might be something to do with the timezones during the
date to string  and string to date conversion but I can't find it.

Thanks,

Gary

SOLRJ: Is there a way to obtain a quick count of total results for a query

I can achieve this by building a query with start and rows = 0, and using
queryResponse.getResults().getNumFound().

Are there any more efficient approaches to this?

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: question about dates

Oops... I meant to say that Solr doesn't *index* the trailing Z, but it is 
stored (the stored value, not the indexed value.) The query must match the 
indexed value, not the stored value.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Wednesday, May 02, 2012 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: question about dates

The trailing Z is required in your input data to be indexed, but the Z is
not actually stored. Your query must have the trailing Z though, unless
you are doing a wildcard or prefix query.

-- Jack Krupansky

-Original Message- 
From: G.Long

Sent: Wednesday, May 02, 2012 11:18 AM
To: solr-user@lucene.apache.org
Subject: question about dates

Hi :)

I'm starting to use Solr and I'm facing a little problem with dates. My
documents have a date property which is of type 'MMdd'.

To index these dates, I use the following code:

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
doc.addField(date, date);

In the index, the date 20101230 is saved as 2010-12-29T23:00:00Z (
because of GMT).

Now I would like to query documents which have their date property
equals to 20101230 but I don't know how to handle this.

I tried the following code :

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
SimpleDateFormat gmtSdf = new
SimpleDateFormat(-MM-dd'T'HH\\:mm\\:ss'Z');
String gmtString = gmtSdf.format(Date);

The problem is that gmtString is equals to 2010-12-30T00\:00\:00Z.
There is a difference between the index value and the parameter value of
my query : /.

I see that there might be something to do with the timezones during the
date to string  and string to date conversion but I can't find it.

Thanks,

Gary

Re: question about dates

That wasn't right either... the query must have the trailing Z, which Solr 
will strip off to match the indexed value which doesn't have the Z. So, my 
corrected original statement is:


The trailing Z is required in your input data to be indexed, but the Z is 
not actually indexed by Solr (it is stripped), although the stored value of 
the field, if any, would have the original value with the Z. Your query must 
have the trailing Z though (which Solr will strip off), unless you are 
doing a wildcard or prefix query.


Sorry about that.

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Wednesday, May 02, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: question about dates

Oops... I meant to say that Solr doesn't *index* the trailing Z, but it is
stored (the stored value, not the indexed value.) The query must match the
indexed value, not the stored value.

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Wednesday, May 02, 2012 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: question about dates

The trailing Z is required in your input data to be indexed, but the Z is
not actually stored. Your query must have the trailing Z though, unless
you are doing a wildcard or prefix query.

-- Jack Krupansky

-Original Message- 
From: G.Long

Sent: Wednesday, May 02, 2012 11:18 AM
To: solr-user@lucene.apache.org
Subject: question about dates

Hi :)

I'm starting to use Solr and I'm facing a little problem with dates. My
documents have a date property which is of type 'MMdd'.

To index these dates, I use the following code:

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
doc.addField(date, date);

In the index, the date 20101230 is saved as 2010-12-29T23:00:00Z (
because of GMT).

Now I would like to query documents which have their date property
equals to 20101230 but I don't know how to handle this.

I tried the following code :

String dateString = 20101230;
SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
Date date = sdf.parse(dateString);
SimpleDateFormat gmtSdf = new
SimpleDateFormat(-MM-dd'T'HH\\:mm\\:ss'Z');
String gmtString = gmtSdf.format(Date);

The problem is that gmtString is equals to 2010-12-30T00\:00\:00Z.
There is a difference between the index value and the parameter value of
my query : /.

I see that there might be something to do with the timezones during the
date to string  and string to date conversion but I can't find it.

Thanks,

Gary

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Ken Krugler

Hi Robert,

On May 1, 2012, at 7:07pm, Robert Muir wrote:

 On Tue, May 1, 2012 at 6:48 PM, Ken Krugler kkrugler_li...@transpac.com 
 wrote:
 Hi list,
 
 Does anybody know if the Suggester component is designed to work with shards?
 
 I'm not really sure it is? They would probably have to override the
 default merge implementation specified by SpellChecker.

What confuses me is that Suggester says it's based on SpellChecker, which 
supposedly does work with shards.

 But, all of the current suggesters pump out over 100,000 QPS on my
 machine, so I'm wondering what the usefulness of this is?
 
 And if it was useful, merging results from different machines is
 pretty inefficient, for suggest you would shard by term instead so
 that you need only contact a single host?

The issue is that I've got a configuration with 8 shards already that I'm 
trying to leverage for auto-complete.

My quick  dirty work-around would be to add a custom response handler that 
wraps the suggester, and returns results with the fields that the SearchHandler 
needs to do the merge.

-- Ken

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Mahout  Solr

Solr 3.5 - Elevate.xml causing issues when placed under /data directory

2012-05-02 Thread Noordeen, Roxy

Hello,
I just started using elevation for solr. I am on solr 3.5, running with Drupal 
7, Linux.

1. I updated my solrconfig.xml
from
dataDir${solr.data.dir:./solr/data}/dataDir

To
dataDir/usr/local/tomcat2/data/solr/dev_d7/data/dataDir

2. I placed my elevate.xml in my solr's data directory. Based on forum answers, 
I thought placing elevate.xml under data directory would pick my latest change.
I restarted tomcat.

3.  When i placed my elevate.xml under conf directory, elevation was working 
with url:

http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name

But when i moved to data directory,  I am not seeing any results.

NOTE: I can see the catalina.out, printing solr reading the file from data 
directory. I tried to give invalid entries; I noticed solr errors parsing 
elevate.xml from data directory. I even tried to send some documents to index, 
thought commit might help to read the elevate config file. But nothing helped.

I don't understand why below url does not work anymore.  There are no errors in 
the log files.

http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name


Any help on this topic is appreciated.


Thanks

Re: ExtractRH: How to strip metadata

I did some testing, and evidently the meta field is treated specially from 
the ERH.


I copied the example schema, and added both meta and metax fields and 
set fmap.content=metax, and lo and behold only the doc content appears in 
metax, but all the doc metadata appears in meta.


Although, I did get 400 errors with Solr complaining that meta was not a 
multivalued field. This is with Solr 3.6. What release of Solr are you 
using?


I was not aware of this undocumented feature. I haven't checked the code 
yet.


-- Jack Krupansky

-Original Message- 
From: Joseph Hagerty

Sent: Wednesday, May 02, 2012 11:10 AM
To: solr-user@lucene.apache.org
Subject: Re: ExtractRH: How to strip metadata

I do not. I commented out all of the copyFields provided in the default
schema.xml that ships with 3.5. My schema is rather minimal. Here is my
fields block, if this helps:

fields
  field name=cust type=stringindexed=true  stored=true
required=true  /
  field name=assettype=stringindexed=true  stored=true
required=true  /
  field name=ent  type=stringindexed=true  stored=true
required=true  /
  field name=meta type=text_en   indexed=true  stored=true
required=true  /
  dynamicField name=ignored_* type=ignored multiValued=true/
  !--field name=modified  type=dateTime  indexed=true
stored=true  required=false /--
/fields


On Wed, May 2, 2012 at 10:59 AM, Jack Krupansky 
j...@basetechnology.comwrote:



Check to see if you have a CopyField for a wildcard pattern that copies to
meta, which would copy all of the Tika-generated fields to meta.

-- Jack Krupansky

-Original Message- From: Joseph Hagerty
Sent: Wednesday, May 02, 2012 9:56 AM
To: solr-user@lucene.apache.org
Subject: ExtractRH: How to strip metadata


Greetings Solr folk,

How can I instruct the extract request handler to ignore metadata/headers
etc. when it constructs the content of the document I send to it?

For example, I created an MS Word document containing just the word
SEARCHWORD and nothing else. However, when I ship this doc to my solr
server, here's what's thrown in the index:

str name=meta
Last-Printed 2009-02-05T15:02:00Z Revision-Number 22 Comments
stream_source_info myfile Last-Author Inigo Montoya Template Normal.dotm
Page-Count 1 subject Application-Name Microsoft Macintosh Word Author 
Jesus

Baggins Word-Count 2 xmpTPg:NPages 1 Edit-Time 1086 Creation-Date
2008-11-05T20:19:00Z stream_content_type application/octet-stream 
Character

Count 14 stream_size 31232 stream_name /Applications/MAMP/tmp/php/**
phpHCIg7y
Company Parkman Elastomers Pvt Ltd Content-Type application/msword 
Keywords

Last-Save-Date 2012-05-01T18:55:00Z SEARCHWORD
/str

All I want is the body of the document, in this case the word 
SEARCHWORD.


For further reference, here's my extraction handler:

requestHandler name=/update/extract
startup=lazy
class=solr.extraction.**ExtractingRequestHandler 
  lst name=defaults
!-- All the main content goes into text... if you need to return
 the extracted text or do highlighting, use a stored field. --
str name=fmap.contentmeta/str
str name=lowernamestrue/str
str name=uprefixignored_/str
  /lst
 /requestHandler

(Ironically, meta is the field in the solr schema to which I'm 
attempting

to extract the body of the document. Don't ask).

Thanks in advance for any pointers you can provide me.

--
- Joe





--
- Joe

Re: Dumb question: Streaming collector /query results

2012-05-02 Thread Mikhail Khludnev

I did small research with the fairly modest result
https://github.com/m-khl/solr-patches/tree/streaming

you can start exploring it from the trivial test
https://github.com/m-khl/solr-patches/blob/17cd45ce7693284de08d39ebc8812aa6a20b8fb3/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java

pls let me know whether it's useful for you.

On Wed, May 2, 2012 at 6:48 PM, vybe3142 vybe3...@gmail.com wrote:

 her words, .. as an alternative , what's the most efficient way to gain
 access to all of the document ids that match a qu




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Removing old documents

2012-05-02 Thread alxsss


 

 I use jetty that comes with solr. 
I use solr's dedupe

updateRequestProcessorChain name=dedupe
   processor class=solr.processor.SignatureUpdateProcessorFactory
 bool name=enabledtrue/bool
 str name=signatureFieldid/str
 bool name=overwriteDupestrue/bool
 str name=fieldsurl/str
 str name=signatureClasssolr.processor.Lookup3Signature/str
   /processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain


and because of this id is not url itself but its encoded signature.

I see solrclean uses url to delete a document.

Is it possible that the issue is because of this mismatch?


Thanks.
Alex.


 

-Original Message-
From: Paul Libbrecht p...@hoplahup.net
To: solr-user solr-user@lucene.apache.org
Sent: Tue, May 1, 2012 11:43 pm
Subject: Re: Removing old documents


With which client?

paul


Le 2 mai 2012 à 01:29, alx...@aim.com a écrit :

 all caching is disabled and I restarted jetty. The same results.

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty

How interesting! You know, I did at one point consider that perhaps the
fieldname meta may be treated specially, but I talked myself out of it. I
reasoned that a field name in my local schema should have no bearing on how
a plugin such as solr-cell/Tika behaves. I should have tested my
hypothesis; even if this phenomenon turns out to be undocumented behavior,
I consider myself a victim of my own assumptions.

I am running version 3.5. You may have gotten the multivalue errors due to
the way your test schema and/or extracting request handler is lain out (my
bad). I am using the ignored fieldtype and a dynamicField called
ignored_ as a catch-all for extraneous fields delivered by Tika.

Thanks for your help! Please keep me posted on any further
insights/revelations, and I'll do the same.

On Wed, May 2, 2012 at 12:54 PM, Jack Krupansky j...@basetechnology.comwrote:

 I did some testing, and evidently the meta field is treated specially
 from the ERH.

 I copied the example schema, and added both meta and metax fields and
 set fmap.content=metax, and lo and behold only the doc content appears in
 metax, but all the doc metadata appears in meta.

 Although, I did get 400 errors with Solr complaining that meta was not a
 multivalued field. This is with Solr 3.6. What release of Solr are you
 using?

 I was not aware of this undocumented feature. I haven't checked the code
 yet.


 -- Jack Krupansky

 -Original Message- From: Joseph Hagerty
 Sent: Wednesday, May 02, 2012 11:10 AM
 To: solr-user@lucene.apache.org
 Subject: Re: ExtractRH: How to strip metadata


 I do not. I commented out all of the copyFields provided in the default
 schema.xml that ships with 3.5. My schema is rather minimal. Here is my
 fields block, if this helps:

 fields
  field name=cust type=stringindexed=true  stored=true
 required=true  /
  field name=assettype=stringindexed=true  stored=true
 required=true  /
  field name=ent  type=stringindexed=true  stored=true
 required=true  /
  field name=meta type=text_en   indexed=true  stored=true
 required=true  /
  dynamicField name=ignored_* type=ignored multiValued=true/
  !--field name=modified  type=dateTime  indexed=true
 stored=true  required=false /--
 /fields


 On Wed, May 2, 2012 at 10:59 AM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  Check to see if you have a CopyField for a wildcard pattern that copies to
 meta, which would copy all of the Tika-generated fields to meta.

 -- Jack Krupansky

 -Original Message- From: Joseph Hagerty
 Sent: Wednesday, May 02, 2012 9:56 AM
 To: solr-user@lucene.apache.org
 Subject: ExtractRH: How to strip metadata


 Greetings Solr folk,

 How can I instruct the extract request handler to ignore metadata/headers
 etc. when it constructs the content of the document I send to it?

 For example, I created an MS Word document containing just the word
 SEARCHWORD and nothing else. However, when I ship this doc to my solr
 server, here's what's thrown in the index:

 str name=meta
 Last-Printed 2009-02-05T15:02:00Z Revision-Number 22 Comments
 stream_source_info myfile Last-Author Inigo Montoya Template Normal.dotm
 Page-Count 1 subject Application-Name Microsoft Macintosh Word Author
 Jesus
 Baggins Word-Count 2 xmpTPg:NPages 1 Edit-Time 1086 Creation-Date
 2008-11-05T20:19:00Z stream_content_type application/octet-stream
 Character
 Count 14 stream_size 31232 stream_name /Applications/MAMP/tmp/php/**

 phpHCIg7y
 Company Parkman Elastomers Pvt Ltd Content-Type application/msword
 Keywords
 Last-Save-Date 2012-05-01T18:55:00Z SEARCHWORD
 /str

 All I want is the body of the document, in this case the word
 SEARCHWORD.

 For further reference, here's my extraction handler:

 requestHandler name=/update/extract
startup=lazy
class=solr.extraction.ExtractingRequestHandler 

  lst name=defaults
!-- All the main content goes into text... if you need to return
 the extracted text or do highlighting, use a stored field. --
str name=fmap.contentmeta/str
str name=lowernamestrue/str
str name=uprefixignored_/str
  /lst
  /requestHandler

 (Ironically, meta is the field in the solr schema to which I'm
 attempting
 to extract the body of the document. Don't ask).

 Thanks in advance for any pointers you can provide me.

 --
 - Joe




 --
 - Joe




-- 
- Joe

Dynamic core creation works in 3.5.0 fails in 3.6.0: At least one core definition required at run-time for Solr 3.6.0?

2012-05-02 Thread Emes, Matthew (US - Irvine)

Hi:

I have been working on an integration project involving Solr 3.5.0 that
dynamically registers cores as needed at run-time, but does not contain any
cores by default. The current solr.xml configuration file is:-

?xml version=1.0 encoding=UTF-8 ?
solr persistent=false sharedLib=lib
  cores adminPath=/admin/cores/
/solr

This configuration does not include any cores as those are created
dynamically by each application that is using the Solr server. This is
working fine with Solr 3.5.0; the server starts and running web
applications can register a new core using SolrJ CoreAdminRequest and
everything is working correctly. However, I tried to update to Solr 3.6.0
and this configuration fails with a SolrException due to the following code
in CoreContainer.java (lines 171-173):-

if (cores.cores.isEmpty()){
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, No cores
were created, please check the logs for errors);
}

This is a change from Solr 3.5.0 which has no such check. I have searched
but cannot find any ticket or notice that this is a planned change in
3.6.0, but before I file a ticket I am asking the community in case this is
an issue that has been discussed and this is a planned direction for Solr.

Thanks,

Matthew

Re: question about dates

2012-05-02 Thread Chris Hostetter


: String dateString = 20101230;
: SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
: Date date = sdf.parse(dateString);
: doc.addField(date, date);
: 
: In the index, the date 20101230 is saved as 2010-12-29T23:00:00Z ( because
: of GMT).

because of GMT is missleading and vague ... what you get in your index 
is a value of 2010-12-29T23:00:00Z because that is the canonical 
string representation of the date object you have passed to doc.addField 
-- the date object you have passed in represents that time, because you 
constructed a SimpleDateFormat object w/o specifying which TimeZone that 
SDF object should assume is in use when it parses it's string input.  So 
when you give it the input 20101230 it treats that is Dec 30, 2010, 
00:00:00.000 in whatever teh local timezone of your client is.

If you want it to treat that input string as a date expression in GMT, 
then you need to configure the parser to use GMT 
(SimpleDateFormat.setTimeZone)

: I tried the following code :
: 
: String dateString = 20101230;
: SimpleDateFormat sdf = new SimpleDateFormat(MMdd);
: Date date = sdf.parse(dateString);
: SimpleDateFormat gmtSdf = new
: SimpleDateFormat(-MM-dd'T'HH\\:mm\\:ss'Z');
: String gmtString = gmtSdf.format(Date);
: 
: The problem is that gmtString is equals to 2010-12-30T00\:00\:00Z. There is

again, that is not a gmtString .. in this case, both of the SDF objects 
you are using have not been configured with an explicit TimeZone, so they 
use whatever hte platform default is where this code is run -- so the 
variable you are calling gmtString is actaully a string representation 
of Date object formated in your local TimeZone.

Bottom line...

* when parsing a string into a Date, you really need to know (and be 
explicit to the parser) about what timezone is represented in that string 
(unless the formated of hte string includes the TimeZone)

* when building a query string to pass to solr, then the DateFormat 
you use to formate a Date object must format it using GMT -- there is a 
DateUtil class included in solrj to make this easier.

If you really don't care at all about TimeZones, then just use GMT 
everywhere .. but if you actually care about what time of day something 
happened, and want to be able to query for events with hour/min/sec/etc.. 
granularity, then you need to be precise about the TimeZone in every 
Formatter you use.


-Hoss

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Robert Muir

On Wed, May 2, 2012 at 12:16 PM, Ken Krugler
kkrugler_li...@transpac.com wrote:

 What confuses me is that Suggester says it's based on SpellChecker, which 
 supposedly does work with shards.


It is based on spellchecker apis, but spellchecker's ranking is based
on simple comparators like string similarity, whereas suggesters use
weights.

when spellchecker merges from shards, it just merges all their top-N
into one set and recomputes this same distance stuff over again.

so, suggester can't possibly work like this correctly (forget about
any technical details), as how can it make assumptions about these
weights you provided. if they were e.g. log() weights from your query
logs then it needs to do log-summation across the shards, etc for the
final combined weight to be correct. This is specific to how you
originally computed the weights you gave it. it certainly cannot be
recomputing anything like spellchecker does :)

Anyways, if you really want to do it, maybe
https://issues.apache.org/jira/browse/SOLR-2848 is helpful. The
background is in 3.x there is really only one spellchecker impl
(AbstractLucene or something like that). I don't think distributed
spellcheck works with any other SpellChecker subclasses in 3.x, i
think its wired to only work with the Abstract-Lucene ones.

When we added another subclass to 4.0, DirectSpellChecker, he saw that
it was broken here and cleaned up the APIs so that spellcheckers can
override this merge() operation. Unfortunately I forgot to commit
those refactorings James did (which lets any spellchecker override
merge()ing) to the 3.x branch, but the ideas might be useful.

-- 
lucidimagination.com

need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.

2012-05-02 Thread locuse


i've installed tomcat7 and solr 3.6.0 on linux/64

i'm trying to get a single webapp + multicore setup working.  my efforts
have gone off the rails :-/ i suspect i've followed too many of the
wrong examples.

i'd appreciate some help/direction getting this working.

so far, i've configured

grep   /etc/tomcat7/server.xml -A2 -B2
 Java AJP  Connector: /docs/config/ajp.html
 APR (HTTP/AJP) Connector: /docs/apr.html
 Define a non-SSL HTTP/1.1 Connector on port
 
--
Connector port= protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443 /
--
!--
Connector executor=tomcatThreadPool
   port= protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443 /

cat /etc/tomcat7/Catalina/localhost/solr.xml
Context docBase=/srv/tomcat7/webapps/solr.war
debug=0 privileged=true allowLinking=true
crossContext=true 
Environment name=solr/home type=java.lang.String
value=/srv/www/solrbase override=true /
/Context

after tomcat restart,

ps ax | grep tomcat
 6129 pts/4Sl 0:06 /etc/alternatives/jre/bin/java
 -classpath
 
:/usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
 -Dcatalina.base=/usr/share/tomcat7
 -Dcatalina.home=/usr/share/tomcat7 -Djava.endorsed.dirs=
 -Djava.io.tmpdir=/var/cache/tomcat7/temp
 
-Djava.util.logging.config.file=/usr/share/tomcat7/conf/logging.properties
 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
 org.apache.catalina.startup.Bootstrap start

if i nav to

 http://127.0.0.1:

i see as expected

 Server Information
  Tomcat Version   JVM VersionJVM Vendor OS Name
  OS Version OS Architecture
  Apache Tomcat/7.0.26 1.7.0_147-icedtea-b147 Oracle Corporation Linux  
  3.1.10-1.9-desktop amd64

now, i'm trying to set up multicore properly.  i configured,

cat /srv/www/solrbase/solr.xml
?xml version=1.0 encoding=UTF-8 ?
solr persistent=false
  cores adminPath=/admin/cores
core name=core0   instanceDir=core0  /
core name=core1   instanceDir=core1  /
  /cores
/solr

then

mkdir -p /srv/www/solrbase/{core0,core1}
cp -a/srv/www/solrbase/conf /srv/www/solrbase/core0/
cp -a/srv/www/solrbase/conf /srv/www/solrbase/core1/

if i nav to

http://localhost:/solr/core0

i get,

HTTP Status 500 - Severe errors in solr configuration. Check
your log files for more detailed information on what may be
wrong. If you want solr to continue after configuration errors,
change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in
solr.xml
-
org.apache.solr.common.SolrException: No cores were created,
please check the logs for errors at

org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:172)
at

org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96)
at

org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
at

org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
at

org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at

org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
at

org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638)
at

org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at

org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615)
at

org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649)
at

org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at

synonyms

2012-05-02 Thread Carlos Andres Garcia

Hello everbody,

I have a doubt with respect to synonyms in Solr, In our company  we are lookink 
for one solution to resolve synonyms from database and not from one text file 
like SynonymFilterFactory do it.

The idea is save all the synonyms in the database, indexing and  they will be 
ready to one query, but we haven't found one solution from database.

Another idea is create one plugin that extend to SynonymFilterFactory but I 
don't know if this is posible.

I hope someone can help me.

regards,

Carlos Andrés García García

Re: synonyms

I'm not sure I completely follow, but are you simply saying that you want to 
have a synonym filter that reads the synonym table from a database rather 
than the current text file? If so, sure, you could develop a replacement for 
the current synonym filter which loads its table from a database, but you 
would have to develop that code yourself (or get some assistance doing it.)


If that is not what you are trying to do, please explain in a little more 
detail.


-- Jack Krupansky

-Original Message- 
From: Carlos Andres Garcia

Sent: Wednesday, May 02, 2012 4:31 PM
To: solr-user@lucene.apache.org
Subject: synonyms

Hello everbody,

I have a doubt with respect to synonyms in Solr, In our company  we are 
lookink for one solution to resolve synonyms from database and not from one 
text file like SynonymFilterFactory do it.


The idea is save all the synonyms in the database, indexing and  they will 
be ready to one query, but we haven't found one solution from database.


Another idea is create one plugin that extend to SynonymFilterFactory but I 
don't know if this is posible.


I hope someone can help me.

regards,

Carlos Andrés García García

RE: synonyms

2012-05-02 Thread Noordeen, Roxy

Another solution is to write a script to read the database and create the 
synonyms.txt file, dump the file to solr and reload the core.
This gives you the custom synonym solution.

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, May 02, 2012 4:54 PM
To: solr-user@lucene.apache.org
Subject: Re: synonyms

I'm not sure I completely follow, but are you simply saying that you want to 
have a synonym filter that reads the synonym table from a database rather 
than the current text file? If so, sure, you could develop a replacement for 
the current synonym filter which loads its table from a database, but you 
would have to develop that code yourself (or get some assistance doing it.)

If that is not what you are trying to do, please explain in a little more 
detail.

-- Jack Krupansky

-Original Message- 
From: Carlos Andres Garcia
Sent: Wednesday, May 02, 2012 4:31 PM
To: solr-user@lucene.apache.org
Subject: synonyms

Hello everbody,

I have a doubt with respect to synonyms in Solr, In our company  we are 
lookink for one solution to resolve synonyms from database and not from one 
text file like SynonymFilterFactory do it.

The idea is save all the synonyms in the database, indexing and  they will 
be ready to one query, but we haven't found one solution from database.

Another idea is create one plugin that extend to SynonymFilterFactory but I 
don't know if this is posible.

I hope someone can help me.

regards,

Carlos Andrés García García

RE: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.

2012-05-02 Thread Robert Petersen

I don't know if this will help but I usually add a dataDir element to
each cores solrconfig.xml to point at a local data folder for the core
like this:


!-- Used to specify an alternate directory to hold all index
data
   other than the default ./data under the Solr home.
   If replication is in use, this should match the replication
configuration. --
dataDir${solr.data.dir:./solr/core0/data}/dataDir


-Original Message-
From: loc...@mm.st [mailto:loc...@mm.st] 
Sent: Wednesday, May 02, 2012 1:06 PM
To: solr-user@lucene.apache.org
Subject: need some help with a multicore config of solr3.6.0+tomcat7.
mine reports: Severe errors in solr configuration.


i've installed tomcat7 and solr 3.6.0 on linux/64

i'm trying to get a single webapp + multicore setup working.  my efforts
have gone off the rails :-/ i suspect i've followed too many of the
wrong examples.

i'd appreciate some help/direction getting this working.

so far, i've configured

grep   /etc/tomcat7/server.xml -A2 -B2
 Java AJP  Connector: /docs/config/ajp.html
 APR (HTTP/AJP) Connector: /docs/apr.html
 Define a non-SSL HTTP/1.1 Connector on port
 
--
Connector port= protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443 /
--
!--
Connector executor=tomcatThreadPool
   port= protocol=HTTP/1.1
   connectionTimeout=2
   redirectPort=8443 /

cat /etc/tomcat7/Catalina/localhost/solr.xml
Context docBase=/srv/tomcat7/webapps/solr.war
debug=0 privileged=true allowLinking=true
crossContext=true 
Environment name=solr/home type=java.lang.String
value=/srv/www/solrbase override=true /
/Context

after tomcat restart,

ps ax | grep tomcat
 6129 pts/4Sl 0:06 /etc/alternatives/jre/bin/java
 -classpath

:/usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli
.jar:/usr/share/java/commons-daemon.jar
 -Dcatalina.base=/usr/share/tomcat7
 -Dcatalina.home=/usr/share/tomcat7 -Djava.endorsed.dirs=
 -Djava.io.tmpdir=/var/cache/tomcat7/temp

-Djava.util.logging.config.file=/usr/share/tomcat7/conf/logging.properti
es

-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
 org.apache.catalina.startup.Bootstrap start

if i nav to

 http://127.0.0.1:

i see as expected

 Server Information
  Tomcat Version   JVM VersionJVM Vendor OS Name
  OS Version OS Architecture
  Apache Tomcat/7.0.26 1.7.0_147-icedtea-b147 Oracle Corporation Linux  
  3.1.10-1.9-desktop amd64

now, i'm trying to set up multicore properly.  i configured,

cat /srv/www/solrbase/solr.xml
?xml version=1.0 encoding=UTF-8 ?
solr persistent=false
  cores adminPath=/admin/cores
core name=core0   instanceDir=core0  /
core name=core1   instanceDir=core1  /
  /cores
/solr

then

mkdir -p /srv/www/solrbase/{core0,core1}
cp -a/srv/www/solrbase/conf /srv/www/solrbase/core0/
cp -a/srv/www/solrbase/conf /srv/www/solrbase/core1/

if i nav to

http://localhost:/solr/core0

i get,

HTTP Status 500 - Severe errors in solr configuration. Check
your log files for more detailed information on what may be
wrong. If you want solr to continue after configuration errors,
change:
abortOnConfigurationErrorfalse/abortOnConfigurationError in
solr.xml
-
org.apache.solr.common.SolrException: No cores were created,
please check the logs for errors at

org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.
java:172)
at

org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
96)
at

org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationF
ilterConfig.java:277)
at

org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFi
lterConfig.java:258)
at

org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(Applicatio
nFilterConfig.java:382)
at

org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilte
rConfig.java:103)
at

org.apache.catalina.core.StandardContext.filterStart(StandardContext.jav
a:4638)
at

org.apache.catalina.core.StandardContext.startInternal(StandardContext.j
ava:5294)

Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.

I chronicled exactly what I had to configure to slay this dragon at
http://vinaybalamuru.wordpress.com/2012/04/12/solr4-tomcat-multicor/

Hope that helps

--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-some-help-with-a-multicore-config-of-solr3-6-0-tomcat7-mine-reports-Severe-errors-in-solr-confi-tp3957196p3957389.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase Slop probelm

You are missing the pf, pf2, and pf3 request parameters, which says 
which fields to do phrase proximity boosting on.


pf boosts using the whole query as a phrase, pf2 boosts bigrams, and 
pf3 boost trigrams.


You can use any combination of them, but if you use none of them, ps 
appears to be ignored.


Maybe it should default to doing some boost if none of the field lists is 
given, like boost using bigrams in the qf fields, but it doesn't.


-- Jack Krupansky

-Original Message- 
From: André Maldonado

Sent: Wednesday, May 02, 2012 3:29 PM
To: solr-user@lucene.apache.org
Subject: Phrase Slop probelm

Hi all.

In my index I have a multivalued field that contains a lot of information,
all text searches are based on it. So, When I Do:

http://xxx.xx.xxx.xxx:/Index/select/?start=0rows=12q=term1+term2+term3qf=textoboostfq=field1%3aanother_termdefType=edismaxmm=100%25http://10.100.3.62:8984/solr/Index/select/?start=0rows=12q=churrasqueira+varanda+sacadaps=0qf=textoboost%20textofq=localexibicao%3azapdefType=edismaxmm=100%25debugQuery=trueechoParams=all

I got the same result as in:

http://xxx.xx.xxx.xxx:/Index/select/?start=0rows=12q=term1+term2+term3
*ps=0*qf=textoboostfq=field1%3aanother_termdefType=edismaxmm=100%25http://10.100.3.62:8984/solr/Index/select/?start=0rows=12q=churrasqueira+varanda+sacadaps=0qf=textoboost%20textofq=localexibicao%3azapdefType=edismaxmm=100%25debugQuery=trueechoParams=all

And the same result in:

http://xxx.xx.xxx.xxx:/Index/select/?start=0rows=12q=term1+term2+term3
*ps=10*
qf=textoboostfq=field1%3aanother_termdefType=edismaxmm=100%25http://10.100.3.62:8984/solr/Index/select/?start=0rows=12q=churrasqueira+varanda+sacadaps=0qf=textoboost%20textofq=localexibicao%3azapdefType=edismaxmm=100%25debugQuery=trueechoParams=all

What I'm doing wrong?

Thank's

*
--
*
*E conhecereis a verdade, e a verdade vos libertará. (João 8:32)*

*andre.maldonado*@gmail.com andre.maldon...@gmail.com
(11) 9112-4227

http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.orkut.com.br/Main#Profile?uid=2397703412199036664
http://www.facebook.com/profile.php?id=10659376883
 http://twitter.com/andremaldonado 
http://www.delicious.com/andre.maldonado

 https://profiles.google.com/105605760943701739931
http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3
 http://www.youtube.com/andremaldonado

RE: synonyms

2012-05-02 Thread Carlos Andres Garcia

Thanks for your answers, now I have another cuestions,if I develop the
filter to replacement the current synonym filter,I understand that this
procces would be in time of the indexing because in time of the query search
there are a lot problems knows. if so, how can I do for create my  index
file.

For example:
I have two synonyms Nou cam, Cataluña  for  barcelona  in the data base


Opcion 1)
In time of the indexing would create 2 records like this:

doc
   fieldbarcelonafield
   fieldCamp Noufield
...
doc

and

doc
   fieldbarcelonafield
   fieldCataluñafield
...
doc

Opcion 2)

or only would create  one record like this:

doc
   fieldbarcelonafield
   fieldCamp Nou,Cataluñafield
...
doc


If it create the opcion 2 can looking for by Camp Nou y by Cataluña but when
I looking for by barcelona the Solr return 2 records and that is one error
because barcelona is only one

IF it create the opcion 2 , I have searching wiht wildcards for example *Camp 
Nou* o *Cataluña* y the solr would return one records, the same case if 
searching by barcelona solr would return one recors that is good , but i want 
to know if is the better option or solr have another caracteristic betters  
that can resolve this topic of one better way.

Re: Solr Merge during off peak times

2012-05-02 Thread Otis Gospodnetic

Hello Prabhu,

Look at SPM for Solr (URL in sig below).  It includes Index Statistics graphs, 
and from these graphs you can tell:

* how many docs are in your index
* how many docs are deleted
* size of index on disk
* number of index segments
* number of index files
* maybe something else I'm forgetting now

So from size, # of segments, and index files you will be able to tell when 
merges happened and before/after size, segment and index file count.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 





 From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Otis 
Gospodnetic otis_gospodne...@yahoo.com 
Sent: Wednesday, May 2, 2012 7:22 AM
Subject: RE: Solr Merge during off peak times
 
Ok, thanks Otis
Another question on merging
What is the best way to monitor merging?
Is there something in the log file that I can look for? 
It seems like I have to monitor the system resources - read/write IOPS etc.. 
and work out when a merge happened
It would be great if I can do it by looking at log files or in the admin UI. 
Do you know if this can be done or if there is some tool for this?

Thanks
Prabhu

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: 01 May 2012 15:12
To: solr-user@lucene.apache.org
Subject: Re: Solr Merge during off peak times

Hi Prabhu,

I don't think such a merge policy exists, but it would be nice to have this 
option and I imagine it wouldn't be hard to write if you really just base the 
merge or no merge decision on the time of day (and maybe day of the week).

Note that this should go into Lucene, not Solr, so if you decide to contribute 
your work, please see http://wiki.apache.org/lucene-java/HowToContribute

Otis

Performance Monitoring for Solr - http://sematext.com/spm





 From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org 
Sent: Tuesday, May 1, 2012 8:45 AM
Subject: Solr Merge during off peak times
 
Hi,
  I would like to know if there is a way to configure index merge policy in 
solr so that the merging happens during off peak hours. Can you please let me 
know if such a merge policy configuration exists?

Thanks
Prabhu

solr broke a pipe

2012-05-02 Thread Robert Petersen

Anyone have any clues about this exception?  It happened during the
course of normal indexing.  This is new to me (we're running solr 3.6 on
tomcat 6/redhat RHEL) and we've been running smoothly for some time now
until this showed up:

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

 

Apache Tomcat Version 6.0.20

 

java.runtime.version = 1.6.0_25-b06

 

java.vm.name = Java HotSpot(TM) 64-Bit Server VM

 

May 2, 2012 4:07:48 PM
org.apache.solr.handler.ReplicationHandler$FileStream write

WARNING: Exception while writing response for params:
indexversion=1276893500358file=_1uca.frqcommand=filecontentchecksum=t
ruewt=filestream

ClientAbortException:  java.net.SocketException: Broken pipe

at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.j
ava:358)

at
org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:354)

at
org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:
381)

at
org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:370)

at
org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStrea
m.java:89)

at
org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java
:87)

at
org.apache.solr.handler.ReplicationHandler$FileStream.write(ReplicationH
andler.java:1076)

at
org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.ja
va:936)

at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFil
ter.java:345)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:273)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:191)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
93)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84
9)

at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:583)

at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)

at java.lang.Thread.run(Unknown Source)

Caused by: java.net.SocketException: Broken pipe

at java.net.SocketOutputStream.socketWrite0(Native
Method)

at java.net.SocketOutputStream.socketWrite(Unknown
Source)

at java.net.SocketOutputStream.write(Unknown Source)

at
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOut
putBuffer.java:740)

at
org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434)

at
org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:349)

at
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.d
oWrite(InternalOutputBuffer.java:764)

at
org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutp
utFilter.java:126)

at
org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuff
er.java:573)

at org.apache.coyote.Response.doWrite(Response.java:560)

at
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.j
ava:353)

... 21 more

Re: syntax for negative query OR something

2012-05-02 Thread Chris Hostetter


: How do I search for things that have no value or a specified value?

Things with no value...
(*:* -fieldName:[* TO *])
Things with a specific value...
fieldName:A
Things with no value or a specific value...
(*:* -fieldName:[* TO *]) fieldName:A
...or if you aren't using OR as your default op
(*:* -fieldName:[* TO *]) OR fieldName:A

: I have a few variations of:
: -fname:[* TO *] OR fname:(A B C)

that is just syntacitic sugar for...
-fname:[* TO *] fname:(A B C)

which is an empty set.

you need to be explicit that the exclude docs with a value in this field 
clause should applied to the set of all documents


-Hoss

Re: syntax for negative query OR something

Sounds good. OR in the negation of any query that matches any possible 
value in a field.


The Solr query parser doc lists the open range as you used:

 -field:[* TO *] finds all documents without a value for field

See:
http://wiki.apache.org/solr/SolrQuerySyntax

This also include pure wildcard that can generate a PrefixQuery:

  -fname:* OR fname:(A B C)


-- Jack Krupansky

-Original Message- 
From: Ryan McKinley

Sent: Wednesday, May 02, 2012 7:18 PM
To: solr-user@lucene.apache.org
Subject: syntax for negative query OR something

How do I search for things that have no value or a specified value?

Essentially I have a field that *may* exist and what the absense of a
field to also match.

I have a few variations of:
-fname:[* TO *] OR fname:(A B C)


Thanks for any pointers

ryan

Re: syntax for negative query OR something


Oops... that is:

(-fname:*) OR fname:(A B C)

or

(-fname:[* TO *]) OR fname:(A B C)

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky 
Sent: Wednesday, May 02, 2012 7:48 PM 
To: solr-user@lucene.apache.org 
Subject: Re: syntax for negative query OR something 

Sounds good. OR in the negation of any query that matches any possible 
value in a field.


The Solr query parser doc lists the open range as you used:

 -field:[* TO *] finds all documents without a value for field

See:
http://wiki.apache.org/solr/SolrQuerySyntax

This also include pure wildcard that can generate a PrefixQuery:

  -fname:* OR fname:(A B C)


-- Jack Krupansky

-Original Message- 
From: Ryan McKinley

Sent: Wednesday, May 02, 2012 7:18 PM
To: solr-user@lucene.apache.org
Subject: syntax for negative query OR something

How do I search for things that have no value or a specified value?

Essentially I have a field that *may* exist and what the absense of a
field to also match.

I have a few variations of:
-fname:[* TO *] OR fname:(A B C)


Thanks for any pointers

ryan

Re: syntax for negative query OR something

Hmmm... I thought that worked in edismax. And I thought that pure negative 
queries were allowed in SolrQueryParser. Oh well.


In any case, in the Lucene or Solr query parser, add *:* to select all 
docs before negating the docs that have any value in the field:


(*:* -fname:*) OR fname:(A B C)

or

(*:* -fname:[* TO *]) OR fname:(A B C)

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Wednesday, May 02, 2012 7:52 PM
To: solr-user@lucene.apache.org
Subject: Re: syntax for negative query OR something

Oops... that is:

(-fname:*) OR fname:(A B C)

or

(-fname:[* TO *]) OR fname:(A B C)

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Wednesday, May 02, 2012 7:48 PM
To: solr-user@lucene.apache.org
Subject: Re: syntax for negative query OR something

Sounds good. OR in the negation of any query that matches any possible
value in a field.

The Solr query parser doc lists the open range as you used:

 -field:[* TO *] finds all documents without a value for field

See:
http://wiki.apache.org/solr/SolrQuerySyntax

This also include pure wildcard that can generate a PrefixQuery:

  -fname:* OR fname:(A B C)


-- Jack Krupansky

-Original Message- 
From: Ryan McKinley

Sent: Wednesday, May 02, 2012 7:18 PM
To: solr-user@lucene.apache.org
Subject: syntax for negative query OR something

How do I search for things that have no value or a specified value?

Essentially I have a field that *may* exist and what the absense of a
field to also match.

I have a few variations of:
-fname:[* TO *] OR fname:(A B C)


Thanks for any pointers

ryan

Re: synonyms