Re: improving search response time

2010-12-21 Thread Anurag

I am using spellchecker in the query part. Now my search time has become
more. say initiallly it was 1000ms now its 3000ms.I have data index of size
9GB. 
My query http://localhost:8983/solr/spellCheckCompRH/?q=
http://localhost:8983/solr/spellCheckCompRH/?q=+search+spellcheck=truefl=spellcheck,title,url,hlhl=truestart=0rows=10indent=on
 

How can i improve the search time. 
i have
1) Fedora 11 as OS
2) Solr run on Jetty Server
3) Front page (search page) is on Tomcat 6
4)Index size is 9GB
5)RAM is 1GB



-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p2125220.html
Sent from the Solr - User mailing list archive at Nabble.com.


Explanation of the different caches.

2010-12-21 Thread Stijn Vanhoorelbeke
Hi,

I want to do a quickdirt load testing - but all my results are cached.
I commented out all the Solr caches - but still everything is cached.

* Can the caching come from the 'Field Collapsing Cache'.
 -- although I don't see this element in my config file.
( As the system now jumps from 1GB to 7 GB of RAM when I do a load
test with lots of queries ).

* Can it be a Lucence cache?

+-+
I want to lower the caches so they cache only some 100 or 1000 documents.
( Right now - when I do 50 000 unique queries Solr will use 7 GB of
RAM and everything fits in some cache! )

Any suggestions how I could proper stress test my Solr - with a small
number of queries (some 100  - not in the millions as some testers
have)?


Re: Dismax score - maximu of any one field?

2010-12-21 Thread Erick Erickson
Also take a look at debugQuery=on output. It takes a while to
decipher what this is telling you, but it'll let you know exactly.

Best
Erick

On Mon, Dec 20, 2010 at 5:37 AM, Jason Brown jason.br...@sjp.co.uk wrote:


 Can anyone tell me hoe the dismax score is computed? Is it the maximum
 score for any of the component fields that are searched? Thank You.

 If you wish to view the St. James's Place email disclaimer, please use the
 link below

 http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer



Consequences for using multivalued on all fields

2010-12-21 Thread Tim Terlegård
In our application we use dynamic fields and there can be about 50 of
them and there can be up to 100 million documents.

Are there any disadvantages having multivalued=true on all fields in
the schema? An admin of the application can specify dynamic fields and
if they should be indexed or stored. Question is if we gain anything
by letting them to choose multivalued as well or if it just adds
complexity to the user interface?

Thanks,
Tim


Re: Consequences for using multivalued on all fields

2010-12-21 Thread kenf_nc

I have about 30 million documents and with the exception of the Unique ID,
Type and a couple of date fields, every document is made of dynamic fields.
Now, I only have maybe 1 in 5 being multi-value, but search and facet
performance doesn't look appreciably different from a fixed schema solution.
I don't do some of the fancier things, highlighting, spell check, etc. And I
use a lot more string or lowercase field types than I do Text (so not as
many fully tokenized fields), that probably helps with performance.

The only disadvantage I know of is dealing with field names at runtime.
Depending on your architecture, you don't really know what your document
looks like until you have it in a result set. For what I'm doing, that isn't
a problem.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Consequences for using multivalued on all fields

2010-12-21 Thread J.J. Larrea
Someone please correct me if I am wrong, but as far as I am aware index format 
is identical in either case.

One benefit of allowing one to specify a field as single-valued is similar to 
specifying that a field is required: Providing a safeguard that index data 
conforms to requirements.  So making all fields multivalued forgoes that 
integrity check for fields which by definition should be singular.

Also depending on the response writer and for the XMLResponseWriter the 
requested response version (see http://wiki.apache.org/solr/XMLResponseFormat) 
the multi-valued setting can determine whether the document values returned 
from a query will be scalars (eg. str name=year2010/str) or arrays of 
scalars (arr name=yearstr2010/str/arr), regardless of how many values 
are actually stored.

But the most significant gotcha of not specifying the actual arity (1 or N) 
arises if any of those fields is used for field-faceting: By default the 
field-faceting logic chooses a different algorithm depending on whether the 
field is multi-valued, and the default choice for multi-valued is only 
appropriate for a small set of enumerated values since it creates a filter 
query for each value in the set. And this can have a profound effect on Solr 
memory utilization. So if you are not relying on the field arity setting to 
select the algorithm, you or your users might need to specify it explicitly 
with the f.field.facet.method argument; see 
http://wiki.apache.org/solr/SolrFacetingOverview for more info.

So while all-multivalued isn't a showstopper, if it were up to me I'd want to 
give users the option to specify arity and whether the field is required.

- J.J.

At 2:13 PM +0100 12/21/10, Tim Terlegård wrote:
In our application we use dynamic fields and there can be about 50 of
them and there can be up to 100 million documents.

Are there any disadvantages having multivalued=true on all fields in
the schema? An admin of the application can specify dynamic fields and
if they should be indexed or stored. Question is if we gain anything
by letting them to choose multivalued as well or if it just adds
complexity to the user interface?

Thanks,
Tim



RE: Explanation of the different caches.

2010-12-21 Thread Toke Eskildsen
Stijn Vanhoorelbeke [stijn.vanhoorelb...@gmail.com] wrote:
 I want to do a quickdirt load testing - but all my results are cached.
 I commented out all the Solr caches - but still everything is cached.
 
 * Can the caching come from the 'Field Collapsing Cache'.
  -- although I don't see this element in my config file.
 ( As the system now jumps from 1GB to 7 GB of RAM when I do a load
 test with lots of queries ).

If you allow the JVM to use a maximum of 7GB heap, it is not that surprising 
that it allocates it when you hammer the searcher. Whether the heap is used for 
caching or just filled with dead object waiting for garbage collection is hard 
to say at this point. Try lowering the maximum heap to 1 GB and do your testing 
again.

Also note that Lucene/Solr performance on conventional harddisks benefits a lot 
from disk caching: If you perform the same search more than one time, the speed 
will increase significantly as relevant parts of the index will (probably) be 
in RAM. Remember to flush your disk cache between tests.

Re: Explanation of the different caches.

2010-12-21 Thread Stijn Vanhoorelbeke
I am aware of the power of the caches.
I do not want to completely remove the caches - I want them to be small.
- So I can launch a stress test with small amount of data.
( Some items may come from cache - some need to be searched up -
right now everything comes from the cache... )

2010/12/21 Toke Eskildsen t...@statsbiblioteket.dk:
 Stijn Vanhoorelbeke [stijn.vanhoorelb...@gmail.com] wrote:
 I want to do a quickdirt load testing - but all my results are cached.
 I commented out all the Solr caches - but still everything is cached.

 * Can the caching come from the 'Field Collapsing Cache'.
   -- although I don't see this element in my config file.
 ( As the system now jumps from 1GB to 7 GB of RAM when I do a load
 test with lots of queries ).

 If you allow the JVM to use a maximum of 7GB heap, it is not that surprising 
 that it allocates it when you hammer the searcher. Whether the heap is used 
 for caching or just filled with dead object waiting for garbage collection is 
 hard to say at this point. Try lowering the maximum heap to 1 GB and do your 
 testing again.

 Also note that Lucene/Solr performance on conventional harddisks benefits a 
 lot from disk caching: If you perform the same search more than one time, the 
 speed will increase significantly as relevant parts of the index will 
 (probably) be in RAM. Remember to flush your disk cache between tests.


backup of Index or Snapshoot ?

2010-12-21 Thread stockii

Hello.

Iam working with the shell-scripts for solr for performing a snapshot of the
index.

to do a snapshot is really easy and works fine. but how can i install a
snaposhot for multi-cores.

i wrote a little script wich install each snapshot for each core:
cd $HOME_DIR/solr/bin
./snapinstaller -M http://localhost:$PORT/solr/core -S $DATA_DIR/payment -d
$DATA_DIR/core

so. but when i start this command comes ssh cannot connect to localhost. why
is it not possible to set the port in this sript !?!?  -- eg: -p 8983 

it works, but why ? i want no errors by using this script ... 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/backup-of-Index-or-Snapshoot-tp2126417p2126417.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: improving search response time

2010-12-21 Thread Shawn Heisey

On 12/21/2010 3:02 AM, Anurag wrote:

I am using spellchecker in the query part. Now my search time has become
more. say initiallly it was 1000ms now its 3000ms.I have data index of size
9GB.
My query http://localhost:8983/solr/spellCheckCompRH/?q=
http://localhost:8983/solr/spellCheckCompRH/?q=+search+spellcheck=truefl=spellcheck,title,url,hlhl=truestart=0rows=10indent=on

How can i improve the search time.
i have
1) Fedora 11 as OS
2) Solr run on Jetty Server
3) Front page (search page) is on Tomcat 6
4)Index size is 9GB
5)RAM is 1GB


Install more memory.  8GB would be a good place to be, more would let 
you fit your entire index into RAM for incredible speed.  Once you get 
above 4GB RAM, it's best if you run a 64-bit OS and Java, which requires 
64-bit processors.  If your index is growing, you might want to have 
more memory than that.


Shawn



Re: Consequences for using multivalued on all fields

2010-12-21 Thread Geert-Jan Brits
You should be aware that the behavior of sorting on a multi-valued field is
undefined. After all, which of the multiple values should be used for
sorting?
So if you need sorting on the field, you shouldn't make it multi-valued.

Geert-Jan

2010/12/21 J.J. Larrea j...@panix.com

 Someone please correct me if I am wrong, but as far as I am aware index
 format is identical in either case.

 One benefit of allowing one to specify a field as single-valued is similar
 to specifying that a field is required: Providing a safeguard that index
 data conforms to requirements.  So making all fields multivalued forgoes
 that integrity check for fields which by definition should be singular.

 Also depending on the response writer and for the XMLResponseWriter the
 requested response version (see
 http://wiki.apache.org/solr/XMLResponseFormat) the multi-valued setting
 can determine whether the document values returned from a query will be
 scalars (eg. str name=year2010/str) or arrays of scalars (arr
 name=yearstr2010/str/arr), regardless of how many values are
 actually stored.

 But the most significant gotcha of not specifying the actual arity (1 or N)
 arises if any of those fields is used for field-faceting: By default the
 field-faceting logic chooses a different algorithm depending on whether the
 field is multi-valued, and the default choice for multi-valued is only
 appropriate for a small set of enumerated values since it creates a filter
 query for each value in the set. And this can have a profound effect on Solr
 memory utilization. So if you are not relying on the field arity setting to
 select the algorithm, you or your users might need to specify it explicitly
 with the f.field.facet.method argument; see
 http://wiki.apache.org/solr/SolrFacetingOverview for more info.

 So while all-multivalued isn't a showstopper, if it were up to me I'd want
 to give users the option to specify arity and whether the field is required.

 - J.J.

 At 2:13 PM +0100 12/21/10, Tim Terlegård wrote:
 In our application we use dynamic fields and there can be about 50 of
 them and there can be up to 100 million documents.
 
 Are there any disadvantages having multivalued=true on all fields in
 the schema? An admin of the application can specify dynamic fields and
 if they should be indexed or stored. Question is if we gain anything
 by letting them to choose multivalued as well or if it just adds
 complexity to the user interface?
 
 Thanks,
 Tim




Re: Consequences for using multivalued on all fields

2010-12-21 Thread Dennis Gearon
Thanks you for the input. You might have seen my posts about doing a flexible 
schema for derived objects. Sounds like dynamic fields might be the ticket.

We'll be ready to test the idea in about a month, mabye 3 weeks. I'll post a 
comment about it whn it gets there.

I don't know if I would gain anything, but I think that ALL boolean that were 
NOT in the base object but wehre in the derived objects could be put into one 
field and textually positioned key:pairs, at least for searh purposes. 


Since the derived object would have it's own, additional methods, one of those 
methods could be to 'unserialize' the 'boolean column'. In fact, that could be 
a 
base object function - Empty boolean column values just end up not populating 
any extra base object attiributes.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: kenf_nc ken.fos...@realestate.com
To: solr-user@lucene.apache.org
Sent: Tue, December 21, 2010 6:07:51 AM
Subject: Re: Consequences for using multivalued on all fields


I have about 30 million documents and with the exception of the Unique ID,
Type and a couple of date fields, every document is made of dynamic fields.
Now, I only have maybe 1 in 5 being multi-value, but search and facet
performance doesn't look appreciably different from a fixed schema solution.
I don't do some of the fancier things, highlighting, spell check, etc. And I
use a lot more string or lowercase field types than I do Text (so not as
many fully tokenized fields), that probably helps with performance.

The only disadvantage I know of is dealing with field names at runtime.
Depending on your architecture, you don't really know what your document
looks like until you have it in a result set. For what I'm doing, that isn't
a problem.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: improving search response time

2010-12-21 Thread Anurag

Thanks  a lot!
you mean i have to increase the resources.
1.Can the distributed search improve the speed.?
2.I have read from some thread that spellchecker takes time.Is spellchecker
is one of the curlprit for  more response time?

On Tue, Dec 21, 2010 at 10:20 PM, Shawn Heisey-4 [via Lucene] 
ml-node+2126869-977261384-146...@n3.nabble.comml-node%2b2126869-977261384-146...@n3.nabble.com
 wrote:

 On 12/21/2010 3:02 AM, Anurag wrote:

  I am using spellchecker in the query part. Now my search time has become
  more. say initiallly it was 1000ms now its 3000ms.I have data index of
 size
  9GB.
  My query http://localhost:8983/solr/spellCheckCompRH/?q=

  http://localhost:8983/solr/spellCheckCompRH/?q=+search+spellcheck=truefl=spellcheck,title,url,hlhl=truestart=0rows=10indent=on

 
  How can i improve the search time.
  i have
  1) Fedora 11 as OS
  2) Solr run on Jetty Server
  3) Front page (search page) is on Tomcat 6
  4)Index size is 9GB
  5)RAM is 1GB

 Install more memory.  8GB would be a good place to be, more would let
 you fit your entire index into RAM for incredible speed.  Once you get
 above 4GB RAM, it's best if you run a 64-bit OS and Java, which requires
 64-bit processors.  If your index is growing, you might want to have
 more memory than that.

 Shawn



 --
  View message @
 http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p2126869.html
 To unsubscribe from improving search response time, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1204491code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwxMjA0NDkxfC0yMDk4MzQ0MTk2.





-- 
Kumar Anurag


-
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/improving-search-response-time-tp1204491p2127198.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Case Insensitive sorting while preserving case during faceted search

2010-12-21 Thread Chris Hostetter

: I am trying to do a facet search and sort the facet values too.
...
: Then I followed the sample example schema.xml, created a copyField of type
...
:   fieldType name=alphaOnlySort class=solr.TextField
: sortMissingLast=true omitNorms=true
...
: But the sorted facet values dont have their case preserved anymore. 
: 
: How can I get around this?

Did you look at how/why/when alphaOnlySort is used in the example?

The FAQ entry you refered to address almost the exact same scnerio with 
wanting to search/sort on the same data...

http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F

...the simplest thing to do is to use copyField to index a second version 
of your field using the StrField class.


So have one version of your field using StrField that you facet on, and 
copyField that to another version (using TextField and 
KeywordTokenizer) that you sort on.
 


-Hoss


Faceting memory requirements

2010-12-21 Thread Rok Rejc
Dear all,

I have created an index with aprox. 1.1 billion of documents (around 500GB)
running on Solr 1.4.1. (64 bit JVM).

I want to enable faceted navigation on am int field, which contains around
250 unique values.
According to the wiki there are two methods:

facet.method=fc which uses field cache. This method should use MaxDoc*4
bytes of memory which is around: 4.1GB.

facet.method=enum which crated a bitset for each unique value. This method
should use NumberOfUniqueValues * SizeOfBitSet which is around 32GB.

Are my calculations correct?

My memory settings in Tomcat (windows) are:
Initial memory pool: 4096 MB
Maximum memory pool: 8192 MB (total 12GB in my test machine)

I have tried to run a query
(...facet=truefacet.field=PublisherIdfacet.method=fc) but I am still
getting OOM:

HTTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap
space at
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:703)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
at
org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:350)
at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:255)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
at
...

Any idea what am I doing wrong, or have I miscalculated the memory
requirements?

Many thanks,
Rok


Re: Case Insensitive sorting while preserving case during faceted search

2010-12-21 Thread Jonathan Rochkind
Hoss, I think the use case being asked about is specifically doing a 
facet.sort though, for cases where you actually do want to sort facet 
values with facet.sort, not sort records -- while still presenting the 
facet values with original case, but sorting them case insensitively.


The solutions offered at those URLs don't address this.

Because I'm pretty sure there isn't really any good solution for this, 
Solr just won't do that, just how it goes.


On 12/21/2010 2:33 PM, Chris Hostetter wrote:

: I am trying to do a facet search and sort the facet values too.
...
: Then I followed the sample example schema.xml, created a copyField of type
...
:   fieldType name=alphaOnlySort class=solr.TextField
: sortMissingLast=true omitNorms=true
...
: But the sorted facet values dont have their case preserved anymore.
:
: How can I get around this?

Did you look at how/why/when alphaOnlySort is used in the example?

The FAQ entry you refered to address almost the exact same scnerio with
wanting to search/sort on the same data...

http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F

...the simplest thing to do is to use copyField to index a second version
of your field using the StrField class.


So have one version of your field using StrField that you facet on, and
copyField that to another version (using TextField and
KeywordTokenizer) that you sort on.



-Hoss



Re: Faceting memory requirements

2010-12-21 Thread Yonik Seeley
On Tue, Dec 21, 2010 at 4:02 PM, Rok Rejc rokrej...@gmail.com wrote:
 Dear all,

 I have created an index with aprox. 1.1 billion of documents (around 500GB)
 running on Solr 1.4.1. (64 bit JVM).

 I want to enable faceted navigation on am int field, which contains around
 250 unique values.
 According to the wiki there are two methods:

 facet.method=fc which uses field cache. This method should use MaxDoc*4
 bytes of memory which is around: 4.1GB.

facet.method=fc uses the fieldcache, but it uses the StringIndex for
all field types currently, so
you need to add in space for the string representation of all the
unique values.  But this is only
250, so given the large number of docs, your estimate should still be close.

 facet.method=enum which crated a bitset for each unique value. This method
 should use NumberOfUniqueValues * SizeOfBitSet which is around 32GB.

A more efficient representation is used for a set when the set size is
less than maxDoc/64.
This set type uses an int per doc in the set, so should use roughly
the same amount of memory
as a numeric fieldcache entry.


 Are my calculations correct?

 My memory settings in Tomcat (windows) are:
 Initial memory pool: 4096 MB
 Maximum memory pool: 8192 MB (total 12GB in my test machine)

 I have tried to run a query
 (...facet=truefacet.field=PublisherIdfacet.method=fc) but I am still
 getting OOM:

 HTTP Status 500 - Java heap space java.lang.OutOfMemoryError: Java heap
 space at
 org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:703)
 at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224)
 at
 org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692)
 at
 org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:350)
 at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:255)
 at
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
 at
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)
 at
 org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:72)
 at
 ...

 Any idea what am I doing wrong, or have I miscalculated the memory
 requirements?

Perhaps you are already sorting by another field or faceting on
another field that is causing a lot of memory to already be used, and
this pushes it over the edge?  Or perhaps the JVM simply can't find a
contiguous area of memory this large?
Line 703 is this:  so it's failing to create the first array:
  final int[] retArray = new int[reader.maxDoc()];

Although the line after it is even more troublesome:
  String[] mterms = new String[reader.maxDoc()+1];

Although you only need an array of 250 to contain all the unique
terms, the FieldCacheImpl starts out with maxDoc.

I think trunk will be far better in this regard.  You should also try
facet.method=enum though too.

-Yonik
http://www.lucidimagination.com


Re: [Reload-Config] not working

2010-12-21 Thread Adam Estrada
I also noticed that when I run the config-reload command, the following
warning is thrown. I changed all my PK=id to see if that changed anything.
Anyone have any ideas why this is not working for me?

INFO: id is a required field in SolrSchema . But not found in DataConfig.

Regards,
Adm

On Mon, Dec 20, 2010 at 10:58 AM, Adam Estrada estrada.a...@gmail.comwrote:

 This is the response I get...Does it matter that the configuration file is
 called something other than data-config.xml? After I get this I still have
 to restart the service. I wonder...do I need to commit the change?

 ?xml version=1.0 encoding=UTF-8 ?
  
 -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config#
 response
  
 -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config#
 lst name=*responseHeader*
int name=*status*0/int
int name=*QTime*520/int
/lst
  
 -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config#
 lst name=*initArgs*
  
 -http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config#
 lst name=*defaults*
str name=*config*./solr/conf/dataimporthandler/rss.xml/str
   /lst
   /lst
str name=*command*reload-config/str
str name=*status*idle/str
str name=*importResponse*Configuration Re-loaded sucessfully/str
lst name=*statusMessages* /
str name=*WARNING*This response format is experimental. It is
 likely to change in the future./str
   /response



 On Sun, Dec 19, 2010 at 11:12 PM, Ahmet Arslan iori...@yahoo.com wrote:

  a href=
 
 http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=full-import
 Full
  Import/abr /
  a href=
 
 http://localhost:8983/solr/select?clean=falsecommit=trueqt=%2Fdataimportcommand=reload-config
 Reload
  Configuration/a
 
  All,
 
  The links above are meant for me to reload the
  configuration file after a
  change is made and the other is to perform the full import.
  My problem is
  that The reload-config option does not seem to be working.
  Am I doing
  anything wrong? Your expertise is greatly appreciated!

 I am sorry, I hit the reply button accidentally.

 Are you receiving/checking the message
 str name=importResponseConfiguration Re-loaded sucessfully/str
 after the reload?

 And are checking that data-config.xml is a valid xml after editing it
 programatically?

 And instead of editing data-config.xml file cant you use  variable
 resolver? http://search-lucene.com/m/qYzPk2n86iIsubj







[Import Timeout] using /dataimport

2010-12-21 Thread Adam Estrada
All,

I've noticed that there are some RSS feeds that are slow to respond,
especially during high usage times throughout the day. Is there a way to set
the timeout to something really high or have it just wait until the feed is
returned? The entire thing stops working when the feed doesn't respond.

Your ideas are greatly appreciated.
Adam


Re: [Import Timeout] using /dataimport

2010-12-21 Thread Koji Sekiguchi

(10/12/22 9:35), Adam Estrada wrote:

All,

I've noticed that there are some RSS feeds that are slow to respond,
especially during high usage times throughout the day. Is there a way to set
the timeout to something really high or have it just wait until the feed is
returned? The entire thing stops working when the feed doesn't respond.

Your ideas are greatly appreciated.
Adam


readTimeout?
http://wiki.apache.org/solr/DataImportHandler#Configuration_of_URLDataSource_or_HttpDataSource

Koji
--
http://www.rondhuit.com/en/


Solr branch_3x problems

2010-12-21 Thread Alexey Kovyrin
Hello guys,

We at scribd.com have recently deployed our new search cluster based
on Dec 1st, 2010 branch_3x solr code and we're very happy about the
new features in brings.
Though looks like we have a weird problem here: once a day our servers
handling sharded search queries (frontend servers that receive
requests and then fan them out to backend machines) die. Everything
looks cool for a day, memory usage is stable, GC is doing its work as
usual and then eventually we get a weird GC activity spike that
kills whole VM and the only way to bring it back is to kill -9 the
tomcat6 vm and restart it. We've tried different GC tuning options,
tried to reduce caches to almost a zero size, still no luck.

So I was wondering if there were any known issues with solr branch 3x
in the last month that could have caused this kind of problems or if
we could provide any more information that could help to track down
the issue.

Thanks.

-- 
Alexey Kovyrin
http://kovyrin.net/


White space in facet values

2010-12-21 Thread Andy
How do I handle facet values that contain whitespace? Say I have a field 
Product that I want to facet on. A value for Product could be Electric 
Guitar. How should I handle the white space in Electric Guitar during 
indexing? What about when I apply the constraint fq=Product:Electric Guitar?


  


Duplicate values in multiValued field

2010-12-21 Thread Andy
If I put duplicate values into a multiValued field, would that cause any 
issues? 

For example I have a multiValued field Color. Some of my documents have 
duplicate values for that field, such as: Green, Red, Blue, Green, Green. 

Would the above (having 3 duplicate Green) be the same as having the duplicated 
values of: Green, Red, Blue?

Or do I need to clean my data and remove duplicate values before indexing?

Thanks.