Special characters

2009-01-06 Thread Sujatha Arun
Hi,

Can anyone point me to the thread if it exists on indexing special
characters in solr.

Regards
Sujatha


Re: Special characters

2009-01-06 Thread Shalin Shekhar Mangar
You forgot to tell us what do you want to do with special characters?

1. Remove them from the documents while indexing?
2. Don't remove them while indexing?
3. Query with terms containing a special character?

On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun suja.a...@gmail.com wrote:

 Hi,

 Can anyone point me to the thread if it exists on indexing special
 characters in solr.

 Regards
 Sujatha




-- 
Regards,
Shalin Shekhar Mangar.


Re: Special characters

2009-01-06 Thread Sujatha Arun
Hi,

I would like to query terms containing special chars .

Regards
Sujatha




On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 You forgot to tell us what do you want to do with special characters?

 1. Remove them from the documents while indexing?
 2. Don't remove them while indexing?
 3. Query with terms containing a special character?

 On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun suja.a...@gmail.com wrote:

  Hi,
 
  Can anyone point me to the thread if it exists on indexing special
  characters in solr.
 
  Regards
  Sujatha
 



 --
 Regards,
 Shalin Shekhar Mangar.



Default Solr Query

2009-01-06 Thread Bhawani Sharma

Hi All,

I  want to fetch all the data from database.
so what my Solr query should be to get all documents from database?
like in mysql syntex is : SELECT * FROM table;
so what will be the syntex of this query in solr ?
Please reply ASAP.
Thanks in Advance.

Thanks:
Bhawani Sharma
-- 
View this message in context: 
http://www.nabble.com/Default-Solr-Query-tp21307309p21307309.html
Sent from the Solr - User mailing list archive at Nabble.com.



Query about NOT (-) operator

2009-01-06 Thread Kulkarni, Ajit Kamalakar
Hi,

 

The query 

1.   NOT(IBA60019_l:1) AND NOT(IBA60019_l:0) AND
businessType:wt.doc.WTDocument 

works 

 

But below query does not work

 

2.   (NOT(IBA60019_l:1) AND NOT(IBA60019_l:0)) AND
businessType:wt.doc.WTDocument

 

 

Query number  1 shows the records but Query number 2 does not show any
records

 

 

3.   (NOT(IBA60019_l:1) OR NOT(IBA60019_l:0)) AND
businessType:wt.doc.WTDocument

 

The Query no 3 also does not show any records where as it should show
all the records for which businessType is wt.doc.WTDocument

 

 

4.   NOT(IBA60019_l:1) OR NOT(IBA60019_l:0) AND
businessType:wt.doc.WTDocument

 

Query number 4 works as if it is query number 1 i.e. OR is working as
AND

 

 

Can someone comment on this?

 

Thanks,

Ajit



RE: Special characters

2009-01-06 Thread Jana, Kumar Raja
Filtering of special characters depends on the filters you use for the
fields in your schema.xml.

If you are using WordDelimiterFilterFactory in your analyzer then the
special characters get removed during the processing of your field. But
the WordDelimiterFilterFactory does a lot of other things too than just
removing the special characters. If you feel that you can do away with
the other features provided by the filter then you can remove it from
your schema.xml file. In any other case, I guess you will have to
customize the WordDelimiterFilter.java class to suit your purpose.

-Kumar



-Original Message-
From: Sujatha Arun [mailto:suja.a...@gmail.com] 
Sent: Tuesday, January 06, 2009 3:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Special characters

Hi,

I would like to query terms containing special chars .

Regards
Sujatha




On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 You forgot to tell us what do you want to do with special characters?

 1. Remove them from the documents while indexing?
 2. Don't remove them while indexing?
 3. Query with terms containing a special character?

 On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun suja.a...@gmail.com
wrote:

  Hi,
 
  Can anyone point me to the thread if it exists on indexing special
  characters in solr.
 
  Regards
  Sujatha
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Default Solr Query

2009-01-06 Thread Shalin Shekhar Mangar
On Tue, Jan 6, 2009 at 3:09 PM, Bhawani Sharma bhawanisha...@aol.comwrote:


 Hi All,

 I  want to fetch all the data from database.
 so what my Solr query should be to get all documents from database?
 like in mysql syntex is : SELECT * FROM table;
 so what will be the syntex of this query in solr ?


If you meant that you want all documents (i.e. without any queries or
filters), you should use q=*:*

If however you meant that you want *all* the documents inside Solr at once
(a full dump), it is probably a bad idea due to the performance issues.



 --
 View this message in context:
 http://www.nabble.com/Default-Solr-Query-tp21307309p21307309.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Special characters

2009-01-06 Thread Shalin Shekhar Mangar
Kumar's advice is sound. You must make sure you are actually indexing the
special symbols.

To make a query with special characters you must make sure you urlencode the
parameters before sending them to Solr.

There are some symbols which have a special meaning in the lucene query
syntax are '+', '-', ':' which you will have to escape by adding a backslash
in front of it.

On Tue, Jan 6, 2009 at 3:15 PM, Jana, Kumar Raja kj...@ptc.com wrote:

 Filtering of special characters depends on the filters you use for the
 fields in your schema.xml.

 If you are using WordDelimiterFilterFactory in your analyzer then the
 special characters get removed during the processing of your field. But
 the WordDelimiterFilterFactory does a lot of other things too than just
 removing the special characters. If you feel that you can do away with
 the other features provided by the filter then you can remove it from
 your schema.xml file. In any other case, I guess you will have to
 customize the WordDelimiterFilter.java class to suit your purpose.

 -Kumar



 -Original Message-
 From: Sujatha Arun [mailto:suja.a...@gmail.com]
 Sent: Tuesday, January 06, 2009 3:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Special characters

 Hi,

 I would like to query terms containing special chars .

 Regards
 Sujatha




 On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  You forgot to tell us what do you want to do with special characters?
 
  1. Remove them from the documents while indexing?
  2. Don't remove them while indexing?
  3. Query with terms containing a special character?
 
  On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun suja.a...@gmail.com
 wrote:
 
   Hi,
  
   Can anyone point me to the thread if it exists on indexing special
   characters in solr.
  
   Regards
   Sujatha
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: Default Solr Query

2009-01-06 Thread Sandeep_metacube

Hi Bhawani'
Ur Query should be *:*.Try this and have fun.!


Bhawani Sharma wrote:
 
 Hi All,
 
 I  want to fetch all the data from database.
 so what my Solr query should be to get all documents from database?
 like in mysql syntex is : SELECT * FROM table;
 so what will be the syntex of this query in solr ?
 Please reply ASAP.
 Thanks in Advance.
 
 Thanks:
 Bhawani Sharma
 

-- 
View this message in context: 
http://www.nabble.com/Default-Solr-Query-tp21307309p21308455.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImport

2009-01-06 Thread Performance

Paul,

Thanks for the feedback and it does work.  So if I understand this the app
server code (Jetty) is not reading in the environment variables for the
other libraries I need.  How do I add the JDBC files to the path so that I
don't need to copy the files into the directory?  Does jetty have a config
file I should look at?


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 The driver can be put directly into the WEB-INF/lib of the solr web
 app or it can be put into ${solr.home}/lib dir.
 
 or if something is really screwed up you can try the old fashioned way
 of putting your driver jar into JAVA_HOME/lib/ext
 
 --Noble
 
 
 On Tue, Jan 6, 2009 at 7:05 AM, Performance dcr...@crossview.com wrote:

 I have been following this tutorial but I can't seem to get past an error
 related to not being able to load the DB2 Driver.  The user has all the
 right config to load the JDBC driver and Squirrel works fine.  Do I need
 to
 update and path within Solr?



 muxa wrote:

 Looked through the tutorial on data import, section Full Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails


 --
 View this message in context:
 http://www.nabble.com/DataImport-tp17730791p21301571.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImport-tp17730791p21309725.html
Sent from the Solr - User mailing list archive at Nabble.com.



delta index produces multiple results?

2009-01-06 Thread Christoph Notarp

Hi,

I use the DIH with RDBMS for indexing a large mysql database with  
about 7 mill. entries.
Full index is working fine, in schema.xml I implemented a uniqueKey  
field (which is of the type 'text').


I start queries with the dismax query handler, and get my results as  
an php array.


Now, since the database entries change every second, I use the delta  
query property to
a) delete documents from the index that have been deleted in the  
database (there´s a table for deleted items) and
b) update documents in the index that have changed since the last  
index (there´s a last_modified-column in a table for that).


From my understanding, when I start a delta-import, the DIH checks  
the deletedPkQuery first and deletes the documents that should be  
deleted (identified by the uniqueKey-field?).
Seems to work - the catalina.out says INFO: deleted from document to  
Solr: 1851010 for example.
Next thing would be the deltaQuery. This seems to work, too - when  
finished, a query returns the new database entries.

But (and here comes the problem):
The dataimport status always says Added / Changed x-hundred  
documents, deleted 0 documents - no deletes?
Everytime I change an item in the database, and do a delta-import  
after that, my next query will return that item *twice*.
After the next change and next delta-import solr will return *three*  
result documents, and so on.
As I mentioned before, I get my search results as an array, consisting  
of many arrays (= solr documents) with the fields I set in schema.xml.
After changing some documents and delta-indexing them, I get lots of  
identical arrays (even the uniqueKey-field is absolutely identical).


I have read somewhere in the wiki, that an update is a delete of the  
old document plus a new document.
I guess the problem could be that something fails with the delete- 
process, but I don´t have a clue why.


Any ideas?

Thanks in advance
Chris


RE: Using query functions against a type field

2009-01-06 Thread Feak, Todd
:It should be fairly predictible, can you elaborate on what problems you

:have just adding boost queries for the specific types?

The boost queries are true queries, so the amount boost can be affected
by things like term frequency for the query. The functions aren't
affected by this and therefore more predictable over the life of the
index. If I want to boost documents via multiple factors, their
interaction is very important. If that interaction slowly changes over
the life of the index, I lose that control.

:a generic Parser/ValueSource that let you specific term=float mappings
in 
:it's init params would certianly make a cool patch for Solr.

I do believe I will work on this (may take me a bit). Once I nail it
down, I've got a couple of other easier query functions I would like to
add as well, if they hold value for the community.

-Hoss



Re: Special characters

2009-01-06 Thread Sujatha Arun
Thanks.

When i give  the uriencoding=utf8 in tomcat's server.xml file some of the
special chars are indexed and searchable ,while others are not.

eg: Bernhard Schölkopf ,János Kornai

These are indexed and searchable after the above change.On the browser
however some others display as junk chars .The encoding of browser is utf-8

Regards
Sujatha
On Tue, Jan 6, 2009 at 3:28 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 Kumar's advice is sound. You must make sure you are actually indexing the
 special symbols.

 To make a query with special characters you must make sure you urlencode
 the
 parameters before sending them to Solr.

 There are some symbols which have a special meaning in the lucene query
 syntax are '+', '-', ':' which you will have to escape by adding a
 backslash
 in front of it.

 On Tue, Jan 6, 2009 at 3:15 PM, Jana, Kumar Raja kj...@ptc.com wrote:

  Filtering of special characters depends on the filters you use for the
  fields in your schema.xml.
 
  If you are using WordDelimiterFilterFactory in your analyzer then the
  special characters get removed during the processing of your field. But
  the WordDelimiterFilterFactory does a lot of other things too than just
  removing the special characters. If you feel that you can do away with
  the other features provided by the filter then you can remove it from
  your schema.xml file. In any other case, I guess you will have to
  customize the WordDelimiterFilter.java class to suit your purpose.
 
  -Kumar
 
 
 
  -Original Message-
  From: Sujatha Arun [mailto:suja.a...@gmail.com]
  Sent: Tuesday, January 06, 2009 3:05 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Special characters
 
  Hi,
 
  I would like to query terms containing special chars .
 
  Regards
  Sujatha
 
 
 
 
  On Tue, Jan 6, 2009 at 2:59 PM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
   You forgot to tell us what do you want to do with special characters?
  
   1. Remove them from the documents while indexing?
   2. Don't remove them while indexing?
   3. Query with terms containing a special character?
  
   On Tue, Jan 6, 2009 at 2:55 PM, Sujatha Arun suja.a...@gmail.com
  wrote:
  
Hi,
   
Can anyone point me to the thread if it exists on indexing special
characters in solr.
   
Regards
Sujatha
   
  
  
  
   --
   Regards,
   Shalin Shekhar Mangar.
  
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Date Range Search

2009-01-06 Thread Sourabh1

For date range search, use alldate:[date1 T23:59:59Z TO date2
T23:59:59Z]. 

Thanks,
Sourabh

Gavin-39 wrote:
 
 Hi,
   Can some one tell me how I can achieve date range searches? For
 instance if I save the DOB as a solr date field how can I do a search to
 get the people,
   1. Who are older than 30 years
   2. Who were born in 1975
 
 etc. Greatly appreciate your help.
 
 Thanks,
 -- 
 Gavin Selvaratnam,
 Project Leader
 
 hSenid Mobile Solutions
 Phone: +94-11-2446623/4 
 Fax: +94-11-2307579 
 
 Web: http://www.hSenidMobile.com 
  
 Make it happen
 
 Disclaimer: This email and any files transmitted with it are confidential
 and intended solely for 
 the use of the individual or entity to which they are addressed. The
 content and opinions 
 contained in this email are not necessarily those of hSenid Software
 International. 
 If you have received this email in error please contact the sender.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Date-Range-Search-tp15305477p21314862.html
Sent from the Solr - User mailing list archive at Nabble.com.



Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia

I'm running load tests against my Solr instance. I find that it typically
takes ~10 minutes for my Solr setup to warm-up while I throw my test
queries at it. Also, I have the same two warm-up queries specified for the
firstSearcher and newSearcher event listeners. 

I'm now benchmarking the affect of updating an index under load. I'm finding
that after running snapinstaller, Solr takes ~1 hour to get back to the same
performance numbers I was getting 10 minutes after a restart. If I can
justify being offline for a few moments, it seems like I'll be better off
restarting Solr rather than running Snapinstaller.

Any ideas why?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315273.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Snapinstaller vs Solr Restart

2009-01-06 Thread Feak, Todd
First suspect would be Filter Cache settings and Query Cache settings.

If they are auto-warming at all, then there is a definite difference
between the first start behavior and the post-commit behavior. This
affects what's in memory, caches, etc.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Tuesday, January 06, 2009 9:46 AM
To: solr-user@lucene.apache.org
Subject: Snapinstaller vs Solr Restart


I'm running load tests against my Solr instance. I find that it
typically
takes ~10 minutes for my Solr setup to warm-up while I throw my test
queries at it. Also, I have the same two warm-up queries specified for
the
firstSearcher and newSearcher event listeners. 

I'm now benchmarking the affect of updating an index under load. I'm
finding
that after running snapinstaller, Solr takes ~1 hour to get back to the
same
performance numbers I was getting 10 minutes after a restart. If I can
justify being offline for a few moments, it seems like I'll be better
off
restarting Solr rather than running Snapinstaller.

Any ideas why?

Thanks.
-- 
View this message in context:
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315273.
html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Using query functions against a type field

2009-01-06 Thread Feak, Todd
I'm not sure I followed all that Yonik.

Are you saying that I can achieve this affect now with a bq setting in
my DisMax query instead of via a bf setting?

-Todd Feak

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Tuesday, January 06, 2009 9:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Using query functions against a type field

On Tue, Jan 6, 2009 at 10:41 AM, Feak, Todd todd.f...@smss.sony.com
wrote:
 The boost queries are true queries, so the amount boost can be
affected
 by things like term frequency for the query.

Sounds like a constant score query is a general way to do this.

Possible QParser syntax:
{!const}tag:FOO OR tag:BAR

Could be implemented via
ConstantScoreQuery(QueryWrapperFilter(theQuery))

The value could be the boost, optionally set within this QParser...
{!const v=2.0}tag:FOO OR tag:BAR

-Yonik



RE: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia

Sorry, I forgot to include that. All my autowarmcount's are set to 0.


Feak, Todd wrote:
 
 First suspect would be Filter Cache settings and Query Cache settings.
 
 If they are auto-warming at all, then there is a definite difference
 between the first start behavior and the post-commit behavior. This
 affects what's in memory, caches, etc.
 
 -Todd Feak
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315654.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using query functions against a type field

2009-01-06 Thread Yonik Seeley
On Tue, Jan 6, 2009 at 10:41 AM, Feak, Todd todd.f...@smss.sony.com wrote:
 The boost queries are true queries, so the amount boost can be affected
 by things like term frequency for the query.

Sounds like a constant score query is a general way to do this.

Possible QParser syntax:
{!const}tag:FOO OR tag:BAR

Could be implemented via ConstantScoreQuery(QueryWrapperFilter(theQuery))

The value could be the boost, optionally set within this QParser...
{!const v=2.0}tag:FOO OR tag:BAR

-Yonik


DataImportHandler (reading XML w/ paging)

2009-01-06 Thread Jon Baer
Hi,

Anyone have a quick, clever way of dealing w/ paged XML for
DataImportHandler?  I have metadata like this:

paging
pageNumber1/pageNumber
totalPages3/totalPages
count15/count
/paging

I unfortunately can not get all the data in one shot so I need to
maybe a number of requests obtained from the paging meta, but can't
figure out if this is dynamically possible w/ the current DIH setup.
Any tips?

Thanks.

- Jon


Re: Using query functions against a type field

2009-01-06 Thread Yonik Seeley
On Tue, Jan 6, 2009 at 1:05 PM, Feak, Todd todd.f...@smss.sony.com wrote:
 I'm not sure I followed all that Yonik.

 Are you saying that I can achieve this affect now with a bq setting in
 my DisMax query instead of via a bf setting?

Yep, a const QParser would enable that.

bq={!const}foo:bar

-Yonik


RE: Using query functions against a type field

2009-01-06 Thread Feak, Todd
Thanks Yonik!

I still may investigate the query function stuff that was discussed, as
Hoss indicated it may hold value.

-Todd Feak

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Tuesday, January 06, 2009 10:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Using query functions against a type field

On Tue, Jan 6, 2009 at 1:05 PM, Feak, Todd todd.f...@smss.sony.com
wrote:
 I'm not sure I followed all that Yonik.

 Are you saying that I can achieve this affect now with a bq setting in
 my DisMax query instead of via a bf setting?

Yep, a const QParser would enable that.

bq={!const}foo:bar

-Yonik



Re: Snapinstaller vs Solr Restart

2009-01-06 Thread Otis Gospodnetic
Is autowarm count of 0 a good idea, though?
If you don't want to autowarm any caches, doesn't that imply that you have very 
low hit rate and therefore don't care to autowarm?  And if you have a very low 
hit rate, then perhaps caches are not needed at all?


How about this.  Do you optimize your index at any point?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: wojtekpia wojte...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, January 6, 2009 1:07:28 PM
 Subject: RE: Snapinstaller vs Solr Restart
 
 
 Sorry, I forgot to include that. All my autowarmcount's are set to 0.
 
 
 Feak, Todd wrote:
  
  First suspect would be Filter Cache settings and Query Cache settings.
  
  If they are auto-warming at all, then there is a definite difference
  between the first start behavior and the post-commit behavior. This
  affects what's in memory, caches, etc.
  
  -Todd Feak
  
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315654.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia

I use my warm up queries to fill the field cache (or at least that's the
idea). My filterCache hit rate is ~99%  queryResultCache is ~65%. 

I update my index several times a day with no 'optimize', and performance is
seemless. I also update my index once nightly with an 'optimize', and that's
where I see the performance drop.

I'll try turning autowarming on.

Could this have to do with file caching by the OS? 


Otis Gospodnetic wrote:
 
 Is autowarm count of 0 a good idea, though?
 If you don't want to autowarm any caches, doesn't that imply that you have
 very low hit rate and therefore don't care to autowarm?  And if you have a
 very low hit rate, then perhaps caches are not needed at all?
 
 
 How about this.  Do you optimize your index at any point?
 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21319344.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Snapinstaller vs Solr Restart

2009-01-06 Thread Otis Gospodnetic
OK, so that question/answer seems to have hit the nail on the head.  :)

When you optimize your index, all index files get rewritten.  This means that 
everything that the OS cached up to that point goes out the window and the OS 
has to slowly re-cache the hot parts of the index.  If you don't optimize, this 
won't happen.  Do you really need to optimize?  Or maybe a more direct 
question: why are you optimizing?


Regarding autowarming, with such high fq hit rate, I'd make good use of fq 
autowarming.  The result cache rate is lower, but still decent.  I wouldn't 
turn off autowarming the way you have.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: wojtekpia wojte...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, January 6, 2009 4:20:18 PM
 Subject: Re: Snapinstaller vs Solr Restart
 
 
 I use my warm up queries to fill the field cache (or at least that's the
 idea). My filterCache hit rate is ~99%  queryResultCache is ~65%. 
 
 I update my index several times a day with no 'optimize', and performance is
 seemless. I also update my index once nightly with an 'optimize', and that's
 where I see the performance drop.
 
 I'll try turning autowarming on.
 
 Could this have to do with file caching by the OS? 
 
 
 Otis Gospodnetic wrote:
  
  Is autowarm count of 0 a good idea, though?
  If you don't want to autowarm any caches, doesn't that imply that you have
  very low hit rate and therefore don't care to autowarm?  And if you have a
  very low hit rate, then perhaps caches are not needed at all?
  
  
  How about this.  Do you optimize your index at any point?
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21319344.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Snapinstaller vs Solr Restart

2009-01-06 Thread Feak, Todd
Kind of a side-note, but I think it may be worth your while.

If your queryResultCache hit rate is 65%, consider putting a reverse
proxy in front of Solr. It can give performance boosts over the query
cache in Solr, as it doesn't have to pay the cost of reformulating the
response. I've used Varnish with great results. Squid is another option.

-Todd Feak

-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com] 
Sent: Tuesday, January 06, 2009 1:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Snapinstaller vs Solr Restart


I use my warm up queries to fill the field cache (or at least that's the
idea). My filterCache hit rate is ~99%  queryResultCache is ~65%. 

I update my index several times a day with no 'optimize', and
performance is
seemless. I also update my index once nightly with an 'optimize', and
that's
where I see the performance drop.

I'll try turning autowarming on.

Could this have to do with file caching by the OS? 


Otis Gospodnetic wrote:
 
 Is autowarm count of 0 a good idea, though?
 If you don't want to autowarm any caches, doesn't that imply that you
have
 very low hit rate and therefore don't care to autowarm?  And if you
have a
 very low hit rate, then perhaps caches are not needed at all?
 
 
 How about this.  Do you optimize your index at any point?
 

-- 
View this message in context:
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21319344.
html
Sent from the Solr - User mailing list archive at Nabble.com.




Unable to choose request handler

2009-01-06 Thread Mark Ferguson
Hi,

In my solrconfig.xml file there are two request handlers configured: one
uses defType=dismax, and the other doesn't. However, it seems that when the
dismax request handler is set as my default, I have no way of using the
standard request handler . Here is the relevant part of my solrconfig.xml:

requestHandler name=standard class=solr.SearchHandler
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
 /lst
  /requestHandler

  requestHandler name=dismax class=solr.SearchHandler default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
/lst
  /requestHandler


When I run a query with the parameters qt=standarddebugQuery=true, I can
see that it is still using the DismaxQueryParser. There doesn't seem to be
any way to use the standard request handler.

On the other hand, when I set the standard request handler as my default,
the behaviour is equally strange. When I specify no qt parameter at all, it
uses the standard request handler as it should. However, when I enter either
qt=standard or qt=dismax, it uses the dismax request handler!

So it appears that the only way I can choose the request handler I want is
to make the standard request handler my default, then specify no qt
parameter if I want to use it. Has anyone else tried this?

Mark


Re: Unable to choose request handler

2009-01-06 Thread Mark Ferguson
It seems that the problem is related to the defType parameter. When I
specify defType=, it uses the correct request handler. It seems that it is
using the correct request handler, but it is defaulting to defType=dismax,
even though I have not specified that parameter in the standard request
handler configuration.

On Tue, Jan 6, 2009 at 2:57 PM, Mark Ferguson mark.a.fergu...@gmail.comwrote:

 Hi,

 In my solrconfig.xml file there are two request handlers configured: one
 uses defType=dismax, and the other doesn't. However, it seems that when the
 dismax request handler is set as my default, I have no way of using the
 standard request handler . Here is the relevant part of my solrconfig.xml:

 requestHandler name=standard class=solr.SearchHandler
 !-- default values for query parameters --
  lst name=defaults
str name=echoParamsexplicit/str
  /lst
   /requestHandler

   requestHandler name=dismax class=solr.SearchHandler default=true
 lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
 /lst
   /requestHandler


 When I run a query with the parameters qt=standarddebugQuery=true, I can
 see that it is still using the DismaxQueryParser. There doesn't seem to be
 any way to use the standard request handler.

 On the other hand, when I set the standard request handler as my default,
 the behaviour is equally strange. When I specify no qt parameter at all, it
 uses the standard request handler as it should. However, when I enter either
 qt=standard or qt=dismax, it uses the dismax request handler!

 So it appears that the only way I can choose the request handler I want is
 to make the standard request handler my default, then specify no qt
 parameter if I want to use it. Has anyone else tried this?

 Mark



Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia

I'm optimizing because I thought I should. I'll be updating my index
somewhere between every 15 minutes, and every 2 hours. That means between 12
and 96 updates per day. That seems like a lot of index files (and it scared
me a little), so that's my second reason for wanting to optimize nightly.

I haven't benchmarked the performance hit for not optimizing. That'll be my
next step. If the hit isn't too bad, I'll look into optimizing less
frequently (weekly, ...).

Thanks Otis!


Otis Gospodnetic wrote:
 
 OK, so that question/answer seems to have hit the nail on the head.  :)
 
 When you optimize your index, all index files get rewritten.  This means
 that everything that the OS cached up to that point goes out the window
 and the OS has to slowly re-cache the hot parts of the index.  If you
 don't optimize, this won't happen.  Do you really need to optimize?  Or
 maybe a more direct question: why are you optimizing?
 
 
 Regarding autowarming, with such high fq hit rate, I'd make good use of fq
 autowarming.  The result cache rate is lower, but still decent.  I
 wouldn't turn off autowarming the way you have.
 
 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Unable to choose request handler

2009-01-06 Thread Mark Ferguson
I apologize, entering the defType parameter explicitly has nothing to do
with it, this was a caching issue. I tested the different configurations
thoroughly, and this is what I've come up with:

  - When using 'dismax' request handler as default:
- Queries are always parsed using the dismax parser, whether I use
qt=standard, qt=dismax, or qt=. It _does_ use the correct request handler,
because the echo'd params are correct for that handler. However, it seems to
always be using defType=dismax. I can tell this because when I use the
parameter debugQuery=true, I can see that it is creating a
DisjunctionMaxQuery.
  - When using 'standard' request handler as default:
- The behaviour is as expected. When I enter no qt parameter or
qt=standard, it uses the standard request handler and doesn't use dismax for
the defType. When I use qt=dismax, it uses the dismax request handler and
dismax for the defType.

So the problem is when setting the default request handler to dismax, it
always uses defType=dismax (even though it uses the 'standard' request
handler). defType=dismax does not show up in the echo'd parameters, but I
can tell by using debugQuery=true (and the fact that I get no results when I
specify a field).

Can someone try reproducing this using the configuration I specified in my
first post? Sorry again for being confusing, I got sidetracked by the
caching issue.

Mark



On Tue, Jan 6, 2009 at 3:01 PM, Mark Ferguson mark.a.fergu...@gmail.comwrote:

 It seems that the problem is related to the defType parameter. When I
 specify defType=, it uses the correct request handler. It seems that it is
 using the correct request handler, but it is defaulting to defType=dismax,
 even though I have not specified that parameter in the standard request
 handler configuration.


 On Tue, Jan 6, 2009 at 2:57 PM, Mark Ferguson 
 mark.a.fergu...@gmail.comwrote:

 Hi,

 In my solrconfig.xml file there are two request handlers configured: one
 uses defType=dismax, and the other doesn't. However, it seems that when the
 dismax request handler is set as my default, I have no way of using the
 standard request handler . Here is the relevant part of my solrconfig.xml:

 requestHandler name=standard class=solr.SearchHandler
 !-- default values for query parameters --
  lst name=defaults
str name=echoParamsexplicit/str
  /lst
   /requestHandler

   requestHandler name=dismax class=solr.SearchHandler default=true
 lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
 /lst
   /requestHandler


 When I run a query with the parameters qt=standarddebugQuery=true, I can
 see that it is still using the DismaxQueryParser. There doesn't seem to be
 any way to use the standard request handler.

 On the other hand, when I set the standard request handler as my default,
 the behaviour is equally strange. When I specify no qt parameter at all, it
 uses the standard request handler as it should. However, when I enter either
 qt=standard or qt=dismax, it uses the dismax request handler!

 So it appears that the only way I can choose the request handler I want is
 to make the standard request handler my default, then specify no qt
 parameter if I want to use it. Has anyone else tried this?

 Mark





Re: Snapinstaller vs Solr Restart

2009-01-06 Thread Otis Gospodnetic
Lower your mergeFactor and Lucene will merge segments(i.e. fewer index files) 
and purge deletes more often for you at the expense of somewhat slower indexing.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: wojtekpia wojte...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, January 6, 2009 5:18:26 PM
 Subject: Re: Snapinstaller vs Solr Restart
 
 
 I'm optimizing because I thought I should. I'll be updating my index
 somewhere between every 15 minutes, and every 2 hours. That means between 12
 and 96 updates per day. That seems like a lot of index files (and it scared
 me a little), so that's my second reason for wanting to optimize nightly.
 
 I haven't benchmarked the performance hit for not optimizing. That'll be my
 next step. If the hit isn't too bad, I'll look into optimizing less
 frequently (weekly, ...).
 
 Thanks Otis!
 
 
 Otis Gospodnetic wrote:
  
  OK, so that question/answer seems to have hit the nail on the head.  :)
  
  When you optimize your index, all index files get rewritten.  This means
  that everything that the OS cached up to that point goes out the window
  and the OS has to slowly re-cache the hot parts of the index.  If you
  don't optimize, this won't happen.  Do you really need to optimize?  Or
  maybe a more direct question: why are you optimizing?
  
  
  Regarding autowarming, with such high fq hit rate, I'd make good use of fq
  autowarming.  The result cache rate is lower, but still decent.  I
  wouldn't turn off autowarming the way you have.
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Unable to choose request handler

2009-01-06 Thread Yonik Seeley
On Tue, Jan 6, 2009 at 5:01 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote:
 It seems that the problem is related to the defType parameter. When I
 specify defType=, it uses the correct request handler. It seems that it is
 using the correct request handler, but it is defaulting to defType=dismax,
 even though I have not specified that parameter in the standard request
 handler configuration.

defType only controls the default type of the main query (not the
whole handler).
Try defType=lucene

-Yonik

 On Tue, Jan 6, 2009 at 2:57 PM, Mark Ferguson 
 mark.a.fergu...@gmail.comwrote:

 Hi,

 In my solrconfig.xml file there are two request handlers configured: one
 uses defType=dismax, and the other doesn't. However, it seems that when the
 dismax request handler is set as my default, I have no way of using the
 standard request handler . Here is the relevant part of my solrconfig.xml:

 requestHandler name=standard class=solr.SearchHandler
 !-- default values for query parameters --
  lst name=defaults
str name=echoParamsexplicit/str
  /lst
   /requestHandler

   requestHandler name=dismax class=solr.SearchHandler default=true
 lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
 /lst
   /requestHandler


 When I run a query with the parameters qt=standarddebugQuery=true, I can
 see that it is still using the DismaxQueryParser. There doesn't seem to be
 any way to use the standard request handler.

 On the other hand, when I set the standard request handler as my default,
 the behaviour is equally strange. When I specify no qt parameter at all, it
 uses the standard request handler as it should. However, when I enter either
 qt=standard or qt=dismax, it uses the dismax request handler!

 So it appears that the only way I can choose the request handler I want is
 to make the standard request handler my default, then specify no qt
 parameter if I want to use it. Has anyone else tried this?

 Mark




Re: Unable to choose request handler

2009-01-06 Thread Mark Ferguson
Thanks, this fixed the problem. Maybe this parameter could be added to the
standard request handler in the sample solrconfig.xml, as it is confusing
that it uses the default request handler's defType even when not using that
handler. I didn't completely understand your explanation, though. Thanks for
the fix.

Mark


On Tue, Jan 6, 2009 at 3:40 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Jan 6, 2009 at 5:01 PM, Mark Ferguson mark.a.fergu...@gmail.com
 wrote:
  It seems that the problem is related to the defType parameter. When I
  specify defType=, it uses the correct request handler. It seems that it
 is
  using the correct request handler, but it is defaulting to
 defType=dismax,
  even though I have not specified that parameter in the standard request
  handler configuration.

 defType only controls the default type of the main query (not the
 whole handler).
 Try defType=lucene

 -Yonik

  On Tue, Jan 6, 2009 at 2:57 PM, Mark Ferguson mark.a.fergu...@gmail.com
 wrote:
 
  Hi,
 
  In my solrconfig.xml file there are two request handlers configured: one
  uses defType=dismax, and the other doesn't. However, it seems that when
 the
  dismax request handler is set as my default, I have no way of using the
  standard request handler . Here is the relevant part of my
 solrconfig.xml:
 
  requestHandler name=standard class=solr.SearchHandler
  !-- default values for query parameters --
   lst name=defaults
 str name=echoParamsexplicit/str
   /lst
/requestHandler
 
requestHandler name=dismax class=solr.SearchHandler
 default=true
  lst name=defaults
   str name=defTypedismax/str
   str name=echoParamsexplicit/str
  /lst
/requestHandler
 
 
  When I run a query with the parameters qt=standarddebugQuery=true, I
 can
  see that it is still using the DismaxQueryParser. There doesn't seem to
 be
  any way to use the standard request handler.
 
  On the other hand, when I set the standard request handler as my
 default,
  the behaviour is equally strange. When I specify no qt parameter at all,
 it
  uses the standard request handler as it should. However, when I enter
 either
  qt=standard or qt=dismax, it uses the dismax request handler!
 
  So it appears that the only way I can choose the request handler I want
 is
  to make the standard request handler my default, then specify no qt
  parameter if I want to use it. Has anyone else tried this?
 
  Mark
 
 



Re: Setting up DataImportHandler for Oracle datasource on JBoss

2009-01-06 Thread The Flight Captain

If add the document tag and an entity, I still get the same error when
starting up JBoss.

Here is my full data-config.xml

dataconf
  dataSource type=JdbcDataSource
  driver=oracle.jdbc.OracleDriver
  url=jdbc:oracle:thin:@host:port:service
  user=pctadm
  password=pctadm/
document name=products
entity name=product query=select prd_id from pct_product
field column=prd_id name=id/
/entity
/document
/dataconf

I also have this field one field in my schema.xml nested under fields

   field name=id type=string indexed=true stored=true
required=true / 

When I restart Jboss I get the same stacktrace.

...
2009-01-07 08:41:40,428 ERROR [STDERR] 7/01/2009 08:41:40
org.apache.solr.handler.dataimport.DataImportHandler inform
SEVERE: Exception while loading DataImporter
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
occurred while initializing context Processing Document # 
at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
at
org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:93)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
at org.apache.solr.core.SolrCore.init(SolrCore.java:480)
...
Caused by: java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataConfig.getChildNodes(DataConfig.java:324)
at
org.apache.solr.handler.dataimport.DataConfig.readFromXml(DataConfig.java:236)
at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:170)
... 140 more

 
Am I missing anything else?


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 the document tag and the rest of the stuff is missing in your
 data-config file
 
 On Tue, Jan 6, 2009 at 12:50 PM, The Flight Captain
 jason_sheph...@flightcentre.com wrote:

 I am having trouble setting up an Oracle datasource. Can anyone help me
 connect to the datasource?

 My solrconfig.xml:

 ...
  requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
str name=configdata-config.xml/str
  /lst
/requestHandler
 ...

 My data-config.xml
 dataconf
  dataSource type=JdbcDataSource
  driver=oracle.jdbc.OracleDriver
  url=jdbc:oracle:thin:@hostname:port:service
  user=username
  password=password/
  /dataSource
 /dataconf

 I have placed the oracle driver on the classpath of JBoss.

 I am getting the following errors in the server.log on startup:

 2009-01-06 17:03:12,756 ERROR [STDERR] 6/01/2009 17:03:12
 org.apache.solr.handler.dataimport.DataImportHandler inform
 SEVERE: Exception while loading DataImporter
 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
 occurred while initializing context Processing Document #
at
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
at
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:93)
at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
at org.apache.solr.core.SolrCore.init(SolrCore.java:480)
at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3720)
at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4358)
at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:752)
at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:732)
at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:553)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
 org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297)
at
 org.jboss.mx.server.RawDynamicInvoker.invoke(RawDynamicInvoker.java:164)
at
 org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at
 

Re: Error during indexing.

2009-01-06 Thread Erik Hatcher
What's the XML you're sending it?  It's got something invalid in it,  
obviously.


How are you indexing?  Via SolrJ?  Or some other POST way?

Erik

On Jan 6, 2009, at 2:27 PM, Tushar_Gandhi wrote:



Hi,
  I am getting an error whenever I am going to index specifically  
photo

objects.
For other objects it is working.

Error is :-
SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal  
character

(NULL, unicode 0) encountered: not valid in any content
at [row,col {unknown-source}]: [1,3127]
at
com 
.ctc 
.wstx.sr.StreamScanner.constructNullCharException(StreamScanner.java: 
640)
at  
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java: 
669)
at  
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java: 
660)

at
com 
.ctc 
.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java: 
4240)

at
com 
.ctc 
.wstx 
.sr 
.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java: 
3280)

at
com 
.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java: 
2824)

at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
at
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:321)

at
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java: 
195)

at
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler 
.handleRequestBody(XmlUpdateRequestHandler.java:123)

at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)

at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
202)

at
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)

at
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
213)

at
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
178)

at
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)

at
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)

at
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
107)

at
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
148)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java: 
199)

at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:754)
at
org 
.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java: 
684)

at
org.apache.jk.common.ChannelSocket 
$SocketConnection.runIt(ChannelSocket.java:876)

at
org.apache.tomcat.util.threads.ThreadPool 
$ControlRunnable.run(ThreadPool.java:684)

at java.lang.Thread.run(Thread.java:595)
Anyone can help me out?

Thanks,
Tushar

--
View this message in context: 
http://www.nabble.com/Error-during-indexing.-tp21317294p21317294.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Solr FAQ entry about Dynamically calculated range facet topic

2009-01-06 Thread Yevgeniy Belman
So did anyone put together a FAQ on this subject? I am also interested in
seeing the different ways to get dynamic faceting to work.

In this post, Chris Hostetter dropped a piece of handler code. Is it still
the right path to take for those generated ranges:
$0..$20 (3)
$20..$75 (15)
$75..$123 (8)

Re: Dynamically calculated range facet
http://www.mail-archive.com/solr-user@lucene.apache.org/msg04727.html


Re: date range query performance

2009-01-06 Thread Jim Adams
Can someone explain what this means to me?

I'm having a similar performance issue - it's an index with only 1 million
records or so, but when trying to search on a date range it takes 30
seconds!  Yes, this date is one with hours, minutes, seconds in them -- do I
need to create an additional field without the time component and reindex
all my documents so I can get decent search performance?  Or can I tell Solr
Please ignore the time and do something in a reasonable timeframe (GRIN)

Thanks.

On Fri, Oct 31, 2008 at 10:28 PM, Michael Lackhoff mich...@lackhoff.dewrote:

 On 01.11.2008 06:10 Erik Hatcher wrote:

  Yeah, this should work fine:
 
  field name=timestamp type=date indexed=true stored=true
  default=NOW/DAY multiValued=false/

 Wow, that was fast, thanks!

 -Michael



Re: how large can the index be?

2009-01-06 Thread Jim Adams
Why is NFS mounting such a bad idea? Some solutions for high available disks
suggest that you DO mount the disks NFS to the boxes that need the data.

On Mon, Dec 29, 2008 at 7:42 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 What you have below is not really what we call Distributed Search, but
 more of Query Load Balancing.  Yes, the diagram below will work IF a
 single Solr box (A or B) can really handle a full 50M doc index.  Of course
 handle can be fuzzy.  That is, you could have a large index on a Solr box
 and it will handle it - nothing will crash, nothing will die, it's just
 that it may not be able to handle it well enough - that is, the queries
 may take longer than you'd like.

 NFS mounting an index directory is a separate story and very often a bad
 idea, again because of performance.


 Otis --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Antonio Eggberg antonio_eggb...@yahoo.se
  To: solr-user@lucene.apache.org
  Sent: Monday, December 29, 2008 4:19:23 PM
  Subject: Re: how large can the index be?
 
  Thanks you very much for your answer.
 
  I was afraid of that the each document has about 20 fields.. As you
 pointed out
  it will slow down. Anyway I am thinking is it not possible to do the
 following:
 
  Load Balancer
   |
  Solr A, Solr B, ...
   |
one index
 
  So I send 50% query to Solr A, 50% to Solr B and so forth.. is this not
 good?
  Also to add The index will be like a mounted drive to the solr boxes...
 On the
  above do I really need to worry about Solr Master, Solr Slave? It
 probably solve
  my load but I think query speed will be slow...
 
  Just curious anyone using distributed search in production?
 
  Cheers
 
 
 
  --- Den mån 2008-12-29 skrev Otis Gospodnetic :
 
   Från: Otis Gospodnetic
   Ämne: Re: how large can the index be?
   Till: solr-user@lucene.apache.org
   Datum: måndag 29 december 2008 21.53
   Hi Antonio,
  
   Besides thinking in terms of documents, you also need to
   think in terms of index size on the file system vs. the
   amount of RAM your search application/server can use.  50M
   documents may be doable on a single server if those
   documents are not too large and you have sufficient RAM.  It
   gets even better if your index doesn't change very often
   and if you can get decent hit ratios on the various Solr
   caches.
  
   If you are indexing largish documents, or even something as
   small as an average web page, 50M docs may be too much on a
   commodity box (say dual core 8 GB RAM box)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
From: Antonio Eggberg
To: solr-user@lucene.apache.org
Sent: Monday, December 29, 2008 3:41:48 PM
Subject: how large can the index be?
   
Hi,
   
We are running successfully a solr index of 3 million
   docs. I have just been
informed that our index size will increase to 50
   million. I been going through
the doc
   
http://wiki.apache.org/solr/DistributedSearch
   
Seems like we will loose out on the date facet and
   some more other stuff that we
use. which is important to us. So far we been using 1
   index and 1 machine.
   
Can I still stick with my 1 index but have many query
   servers? We don't update
our index so often this are rather static data. Over
   the past year we have
updated the index data a total of 3 times and about
   300 records :)
   
Can someone provide some idea how/what should I do to
   deal with new datasets?.
   
Thanks for your help.
   
   
   
   __
Går det långsamt? Skaffa dig en snabbare
   bredbandsuppkoppling.
Sök och jämför priser hos Kelkoo.
   
   http://www.kelkoo.se/c-100015813-bredband.html?partnerId=96914325
 
 
__
  Låna pengar utan säkerhet. Jämför vilkor online hos Kelkoo.
 
 http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014




Re: Partitioning the index

2009-01-06 Thread Jim Adams
Are there any particular suggestions on memory size for a machine?  I have a
box that has only 1 million records on it - yet I'm finding that date
searches are already unacceptable (30 seconds) slow.  Other searches seem
okay though.

Thanks!

On Thu, Dec 18, 2008 at 2:02 PM, Yonik Seeley ysee...@gmail.com wrote:

 It's more related to how much memory you have on your boxes, how
 resource intensive your queries are, how many fields you are trying to
 facet on, what acceptable response times are, etc.

 Anyway... a single box is normally good for between 5M and 50M docs,
 but can fall out of that range (both up and down) depending on the
 specifics.

 -Yonik

 On Wed, Dec 17, 2008 at 9:34 PM, s d s.d.sau...@gmail.com wrote:
  Hi,Is there a recommended index size (on disk, number of documents) for
 when
  to start partitioning it to ensure good response time?
  Thanks,
  S
 



Re: Partitioning the index

2009-01-06 Thread Yonik Seeley
On Tue, Jan 6, 2009 at 10:06 PM, Jim Adams jasolru...@gmail.com wrote:
 Are there any particular suggestions on memory size for a machine?  I have a
 box that has only 1 million records on it - yet I'm finding that date
 searches are already unacceptable (30 seconds) slow.  Other searches seem
 okay though.

I assume this is a date  range query (or date faceting)?
Range queries with many unique terms in the range is a known
limitation, and we should hopefully fix this in 1.4.
In the meantime, limiting the precision of dates could help a great deal.

-Yonik


Re: DataImportHandler (reading XML w/ paging)

2009-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
Paging is possible w/ XPathEntityProcessor
look at the $hasMore and $nextUrl in the documentation
if you can explain better and I may be able to give a better solution
eg: where is the metadata coming from and what is the datasource

On Tue, Jan 6, 2009 at 11:17 PM, Jon Baer jonb...@gmail.com wrote:
 Hi,

 Anyone have a quick, clever way of dealing w/ paged XML for
 DataImportHandler?  I have metadata like this:

 paging
pageNumber1/pageNumber
totalPages3/totalPages
count15/count
/paging

 I unfortunately can not get all the data in one shot so I need to
 maybe a number of requests obtained from the paging meta, but can't
 figure out if this is dynamically possible w/ the current DIH setup.
 Any tips?

 Thanks.

 - Jon




-- 
--Noble Paul


Re: Setting up DataImportHandler for Oracle datasource on JBoss

2009-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
the root node is not dataconf
it should be
dataConfig

On Wed, Jan 7, 2009 at 4:23 AM, The Flight Captain
jason_sheph...@flightcentre.com wrote:

 If add the document tag and an entity, I still get the same error when
 starting up JBoss.

 Here is my full data-config.xml

 dataconf
  dataSource type=JdbcDataSource
  driver=oracle.jdbc.OracleDriver
  url=jdbc:oracle:thin:@host:port:service
  user=pctadm
  password=pctadm/
document name=products
entity name=product query=select prd_id from pct_product
field column=prd_id name=id/
/entity
/document
 /dataconf

 I also have this field one field in my schema.xml nested under fields

   field name=id type=string indexed=true stored=true
 required=true /

 When I restart Jboss I get the same stacktrace.

 ...
 2009-01-07 08:41:40,428 ERROR [STDERR] 7/01/2009 08:41:40
 org.apache.solr.handler.dataimport.DataImportHandler inform
 SEVERE: Exception while loading DataImporter
 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
 occurred while initializing context Processing Document #
at
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
at
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:93)
at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
at org.apache.solr.core.SolrCore.init(SolrCore.java:480)
 ...
 Caused by: java.lang.NullPointerException
at
 org.apache.solr.handler.dataimport.DataConfig.getChildNodes(DataConfig.java:324)
at
 org.apache.solr.handler.dataimport.DataConfig.readFromXml(DataConfig.java:236)
at
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:170)
... 140 more
 

 Am I missing anything else?


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 the document tag and the rest of the stuff is missing in your
 data-config file

 On Tue, Jan 6, 2009 at 12:50 PM, The Flight Captain
 jason_sheph...@flightcentre.com wrote:

 I am having trouble setting up an Oracle datasource. Can anyone help me
 connect to the datasource?

 My solrconfig.xml:

 ...
  requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
str name=configdata-config.xml/str
  /lst
/requestHandler
 ...

 My data-config.xml
 dataconf
  dataSource type=JdbcDataSource
  driver=oracle.jdbc.OracleDriver
  url=jdbc:oracle:thin:@hostname:port:service
  user=username
  password=password/
  /dataSource
 /dataconf

 I have placed the oracle driver on the classpath of JBoss.

 I am getting the following errors in the server.log on startup:

 2009-01-06 17:03:12,756 ERROR [STDERR] 6/01/2009 17:03:12
 org.apache.solr.handler.dataimport.DataImportHandler inform
 SEVERE: Exception while loading DataImporter
 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
 occurred while initializing context Processing Document #
at
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:176)
at
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.java:93)
at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
at org.apache.solr.core.SolrCore.init(SolrCore.java:480)
at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3720)
at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4358)
at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:752)
at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:732)
at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:553)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
 org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297)
at
 

Re: Partitioning the index

2009-01-06 Thread Jim Adams
It's a range query.  I don't have any faceted data.

Can I limit the precision of the existing field, or must I re-index?

Thanks.

On Tue, Jan 6, 2009 at 8:41 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Tue, Jan 6, 2009 at 10:06 PM, Jim Adams jasolru...@gmail.com wrote:
  Are there any particular suggestions on memory size for a machine?  I
 have a
  box that has only 1 million records on it - yet I'm finding that date
  searches are already unacceptable (30 seconds) slow.  Other searches seem
  okay though.

 I assume this is a date  range query (or date faceting)?
 Range queries with many unique terms in the range is a known
 limitation, and we should hopefully fix this in 1.4.
 In the meantime, limiting the precision of dates could help a great deal.

 -Yonik



Re: DataImport

2009-01-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
Which approach worked? I suggested three
Jetty automatically loads jars in WEB-INF/lib
it is the responsibility of Solr to load jars from solr.ome/lib
it is the responsibility of the JRE to load jars from JAVA_HOME/lib/ext

On Tue, Jan 6, 2009 at 6:18 PM, Performance dcr...@crossview.com wrote:

 Paul,

 Thanks for the feedback and it does work.  So if I understand this the app
 server code (Jetty) is not reading in the environment variables for the
 other libraries I need.  How do I add the JDBC files to the path so that I
 don't need to copy the files into the directory?  Does jetty have a config
 file I should look at?


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 The driver can be put directly into the WEB-INF/lib of the solr web
 app or it can be put into ${solr.home}/lib dir.

 or if something is really screwed up you can try the old fashioned way
 of putting your driver jar into JAVA_HOME/lib/ext

 --Noble


 On Tue, Jan 6, 2009 at 7:05 AM, Performance dcr...@crossview.com wrote:

 I have been following this tutorial but I can't seem to get past an error
 related to not being able to load the DB2 Driver.  The user has all the
 right config to load the JDBC driver and Squirrel works fine.  Do I need
 to
 update and path within Solr?



 muxa wrote:

 Looked through the tutorial on data import, section Full Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails


 --
 View this message in context:
 http://www.nabble.com/DataImport-tp17730791p21301571.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul



 --
 View this message in context: 
 http://www.nabble.com/DataImport-tp17730791p21309725.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: Partitioning the index

2009-01-06 Thread Shalin Shekhar Mangar
You'll need to re-index.

On Wed, Jan 7, 2009 at 9:49 AM, Jim Adams jasolru...@gmail.com wrote:

 It's a range query.  I don't have any faceted data.

 Can I limit the precision of the existing field, or must I re-index?

 Thanks.

 On Tue, Jan 6, 2009 at 8:41 PM, Yonik Seeley ysee...@gmail.com wrote:

  On Tue, Jan 6, 2009 at 10:06 PM, Jim Adams jasolru...@gmail.com wrote:
   Are there any particular suggestions on memory size for a machine?  I
  have a
   box that has only 1 million records on it - yet I'm finding that date
   searches are already unacceptable (30 seconds) slow.  Other searches
 seem
   okay though.
 
  I assume this is a date  range query (or date faceting)?
  Range queries with many unique terms in the range is a known
  limitation, and we should hopefully fix this in 1.4.
  In the meantime, limiting the precision of dates could help a great deal.
 
  -Yonik
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: how large can the index be?

2009-01-06 Thread Shalin Shekhar Mangar
On Wed, Jan 7, 2009 at 8:27 AM, Jim Adams jasolru...@gmail.com wrote:

 Why is NFS mounting such a bad idea? Some solutions for high available
 disks
 suggest that you DO mount the disks NFS to the boxes that need the data.


Network requests for each read/write? You can do some benchmarks yourself
and if you find the performance acceptable, go ahead. You should consider a
master/slave replicated setup if you want high availability.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Error during indexing.

2009-01-06 Thread Shalin Shekhar Mangar
Photo objects? Is it binary data you are trying to send in an XML request?

On Wed, Jan 7, 2009 at 12:57 AM, Tushar_Gandhi 
tushar_gan...@neovasolutions.com wrote:


 Hi,
   I am getting an error whenever I am going to index specifically photo
 objects.
 For other objects it is working.

 Error is :-
 SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character
 (NULL, unicode 0) encountered: not valid in any content
 at [row,col {unknown-source}]: [1,3127]
 at

 com.ctc.wstx.sr.StreamScanner.constructNullCharException(StreamScanner.java:640)
 at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:669)
 at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660)
 at

 com.ctc.wstx.sr.BasicStreamReader.readCDataPrimary(BasicStreamReader.java:4240)
 at

 com.ctc.wstx.sr.BasicStreamReader.nextFromTreeCommentOrCData(BasicStreamReader.java:3280)
 at
 com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2824)
 at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
 at

 org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:321)
 at

 org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195)
 at

 org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
 at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
 at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
 at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
 at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:754)
 at

 org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:684)
 at

 org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:876)
 at

 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
 at java.lang.Thread.run(Thread.java:595)
 Anyone can help me out?

 Thanks,
 Tushar

 --
 View this message in context:
 http://www.nabble.com/Error-during-indexing.-tp21317294p21317294.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: date range query performance

2009-01-06 Thread Shalin Shekhar Mangar
On Wed, Jan 7, 2009 at 7:47 AM, Jim Adams jasolru...@gmail.com wrote:

 Can someone explain what this means to me?

 I'm having a similar performance issue - it's an index with only 1 million
 records or so, but when trying to search on a date range it takes 30
 seconds!  Yes, this date is one with hours, minutes, seconds in them -- do
 I
 need to create an additional field without the time component and reindex
 all my documents so I can get decent search performance?  Or can I tell
 Solr
 Please ignore the time and do something in a reasonable timeframe (GRIN)


Range queries are slow if you have a large number of unique terms. With
dates it is especially a problem because the more precise they are, the more
number of terms you've got in that field.

The easy solution is to round off your dates to minimum precision acceptable
to your use-case. You'll need to re-index.

-- 
Regards,
Shalin Shekhar Mangar.