Re: To make sure XML is UTF-8

2007-06-12 Thread Ajanta Phatak

Hi

Not sure if you've had a solution for your problem yet, but I had dealt 
with a similar issue that is mentioned below and hopefully it'll help 
you too. Of course, this assumes that your original data is in utf-8 format.


The default charset encoding for mysql is Latin1 and our display format 
was utf-8 and that was the problem. These are the steps I performed to 
get the search data in utf-8 format..


Changed the my.cnf as so (though we can avoid this by executing commands 
on every new connection if we don't want the whole db in utf format):


Under: [mysqld] added:
# setting default charset to utf-8
collation_server=utf8_unicode_ci
character_set_server=utf8
default-character-set=utf8

Under: [client]
default-character-set=utf8

After changing, restarted mysqld, re-created the db, re-inserted all the 
data again in the db using my data insert code (java program) and 
re-created the Solr index. The key is to change the settings for both 
the mysqld and client sections in my.cnf - the mysqld setting is to make 
sure that mysql doesn't convert it to latin1 while storing the data and 
the client setting is to ensure that the data is not converted while 
accessing - going in or coming out from the server.


Ajanta.


Tiong Jeffrey wrote:

Ya you are right! After I change it to UTF-8 the error still there... I
looked at the log, this is what it appears,

127.0.0.1 -  -  [10/06/2007:03:52:06 +] "POST /solr/update 
HTTP/1.1" 500

4022

I tried to search but couldn't understand what error is this, anybody has
any idea on this?

Thanks!!!

On 6/10/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: way during indexing is - "FATAL: Connection error (is Solr running at
: http://localhost/solr/update
: ?): java.io.IOException: Server returned HTTP Response code: 500 for
URL:
: http://local/solr/update";
: 4.Although the error code doesnt specify is XML utf-8 code error, 
but I

did
: a bit research, and look at the XML file that i have, it doesn't 
fulfill

the
: utf-8 encoding

I *strongly* encourage you to look at the body of the response and/or 
the
error log of your Servlet container and find out *exactly* what the 
cause

of the error is ... you could spend a lot of time working on this and
discover it's not your real problem.



-Hoss





Re: system architecture question when using solr/lucene

2007-05-21 Thread Ajanta Phatak
Thanks to both of you for your responses - Otis and Chris. We did manage 
to run some benchmarks, but we think there are some surprising results 
here. It seems that caching is not affecting performance that much. Is 
that because of the small index size?


Do these seem ok or is there any room for improvement in anyway that you 
could think of?


Regards,
Ajanta.

Results from development servers
Solr 
HTTP Interface
Configurations 
 



   * Index size is approx 500M (a little more)
   * Tomcat 6.0
   * Solr (nightly build dated 2007-04-19)
   * Nginx v0.5.20 is used as load balancer (very light weight in size,
 functionality and cpu consumption) with round-robin distribution
 of requests.
   * Grinder v3.0-beta33 was used for testing. This allows one to write
 custom scripts (in jython) and has nice GUI interface for
 presenting results.
   * Server Config : IntelĀ® Xeon^(TM) 3040 1.87Ghz 1066MHz, 4GB RAM
 (system boot usage 300MB), 8GB swap
   * Querylist was custom build from web with some of them having
 AND/OR between terms. territory field was always US.

Benchmarks
 

Threads 	Servers 	Total queries/ Unique Queries 	Caching 	Performance 
(queries/sec)

25  2   2500/1950   D*  500
25  2   2500/2500   D   142
40  2   4000/4000   D   100
40  2   4000/3000   D   166
40  3   4000/4000   D   133
40(backtoback)  3   4000/4000   D   333
40  3   4000/3300   D   142
10  3   2000/2000   D   434
40  3   4000/4000   Q.Caching: 1024 158
40(backtoback)  3   4000/4000   Q.Caching: 1024 384


Without US territory
 

Threads 	Servers 	Total queries/ Unique Queries 	Caching 	Performance 
(queries/sec)

40  3   4000/4000   D   142
40  2   4000/4000   D   100


Moving territory:US from query to Filters
 

Threads 	Servers 	Total queries/ Unique Queries 	Caching 	Performance 
(queries/sec)

40  3   4000/4000   F.Caching :16384133
40  3   4000/3400   F.Caching :16384147

   * D implies caching was disabled
   * *backtoback* implies same code was run again
   * CPU usage when server was processing query was ~40-50%
   * Tomcat shows 3% memory usage.



Otis Gospodnetic wrote:

Hi Ajanta,

I think you answered your own questions.  Either use Filters or partition the 
index.  The advantage of partitioning is that you can update them separately 
without affecting filters, cache, searcher, etc. for the other indices (i.e. no 
need to warm up with data from the other indices).  If you are indeed working 
with the high QPS, partitioning also lets you scale indices separately (are all 
territories the same size document-wise?  do they all get the same QPS?).  The 
disadvantage is that you can't easily run queries that don't depend on a 
territory.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lucene Consulting -- http://lucene-consulting.com/


- Original Message 
From: Ajanta <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 15, 2007 11:35:13 AM
Subject: system architecture question when using solr/lucene



We are currently looking at large numbers of queries/sec and would like to
optimize that as much as possible. The special need is that we would like to
show specific results based on a specific field - territory field and
depending on where in the world you're coming from we'd like to show you
specific results. The  index is very large (currently 2 million rows) and
could grow even larger (2-3 times) in the future. How do we accomplish this
given that we have some domain knowledge (the territory) to use to our
advantage? Is there a way we can hint solr/lucene to use this information to
provide better results? We could use filters on territory or we could use
different indexes for different territories (individually or in a
combination.)  Are there any other ways to do this? How do we figure out the
best case in this situation?


  


Re: Question about delete

2007-05-10 Thread Ajanta Phatak
I believe in lucene at least deleting documents only marks them for 
deletion. The actual delete happens only after closing the IndexReader. 
Not sure about Solr


Ajanta.

James liu wrote:


but index file size not changed and maxDoc not changed.
>



2007/5/10, Nick Jenkin <[EMAIL PROTECTED]>:


Hi James,
As I understand it numDocs is the number of documents in your index,
maxDoc is the most documents you have ever had in your index.

You currently have no documents in your index by the looks, thus your
delete query must of deleted everything. That would be why you are
getting no results.

-Nick

On 5/10/07, James liu <[EMAIL PROTECTED]> wrote:
> i use command like this
>
> > curl http://localhost:8983/solr/update --data-binary
'name:DDR'
> > curl http://localhost:8983/solr/update --data-binary ''
> >
> >
> and i get
>
> > numDocs : 0
> > maxDoc : 1218819
> >
>
> when i search something which exists in before delete and find 
nothing.

>
> but index file size not changed and maxDoc not changed.
>
> why it happen?
>
>
> --
> regards
> jl
>


--
- Nick