Re: Multiple Process of the SAME solr instance

2008-09-18 Thread mohitranka

Shalin, 
I understand that :-)

My problem is, if 1 solr instance process(save) 100 documents one-by-one, it
would not be very effective, I want to create 10 clones
(process/threads/cores) of the same solr instance, so that 10 documents get
processed(saved to solr) simaltaneously.

Thanks and regards,
Mohit Ranka
  

Shalin Shekhar Mangar wrote:
 
 On Thu, Sep 18, 2008 at 11:03 AM, mohitranka [EMAIL PROTECTED] wrote:
 

 Otis, I understand that 1 solr instance can store n documents
 (one-by-one).
 My query was how to create m such instances/processes/threads so that m
 documents get stored at a time, instead of 1 at a time.

 All the instances should read at the same port.


 You can send a batch of m documents at a time in the same XML.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Process-of-the-SAME-solr-instance-tp19533951p19546626.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr vs Autonomy

2008-09-18 Thread Otis Gospodnetic
Geoff,

Perhaps you can find out the list of features/functionalities that your project 
requires and we can give you quick yes/no.
Or perhaps you can get those others to list those Autonomy features that they 
think they really need, and we can tell you how Solr compares.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Geoff Hopson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 1:46:33 AM
 Subject: Solr vs Autonomy
 
 Hi,
 
 I'm under pressure to justify the use of Solr on my project, and
 others are suggesting that Autonomy be used instead. Apart from price,
 does anyone have a list of pros/cons around Autonomy compared to Solr?
 
 Thanks
 geoff



Re: Setting request method to post on SolrQuery causes ClassCastException

2008-09-18 Thread Otis Gospodnetic
A quick work-around is, I think, to tell Solr to use the non-binary response, 
e.g. wt=xml (I think that's the syntax).

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: syoung [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, September 17, 2008 7:27:30 PM
 Subject: Setting request method to post on SolrQuery causes ClassCastException
 
 
 Hi,
 
 I need to have queries over a certain length done as a post instead of a
 get.  However, when I set the method to post, I get a ClassCastException. 
 Here is the code:
 
 public QueryResponse query(SolrQuery solrQuery) {
 QueryResponse response = null;
 try {
 if (solrQuery.toString().length()  MAX_URL_LENGTH)
 response = server.query(solrQuery, SolrRequest.METHOD.POST);
 else
 response = server.query(solrQuery, SolrRequest.METHOD.GET);
 } catch (SolrServerException e) {
 throw new DataAccessResourceFailureException(e.getMessage(), e);
 }
 return response;
 }
 
 And the stack trace:
 
 java.lang.ClassCastException: java.lang.String
 org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)
 org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113)
 com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33)
 
 Thanks,
 
 Susan
 
 
 -- 
 View this message in context: 
 http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Process of the SAME solr instance

2008-09-18 Thread Otis Gospodnetic
Mohit,

I think you are thinking too hard - trying to optimize something that doesn't 
sound like it needs optimizing at this point in your project.  I suggest you 
start with 1 Solr instance and then see if anything needs to be faster after 
you've pushed that to its limits.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: mohitranka [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 2:15:25 AM
 Subject: Re: Multiple Process of the SAME solr instance
 
 
 Shalin, 
 I understand that :-)
 
 My problem is, if 1 solr instance process(save) 100 documents one-by-one, it
 would not be very effective, I want to create 10 clones
 (process/threads/cores) of the same solr instance, so that 10 documents get
 processed(saved to solr) simaltaneously.
 
 Thanks and regards,
 Mohit Ranka
   
 
 Shalin Shekhar Mangar wrote:
  
  On Thu, Sep 18, 2008 at 11:03 AM, mohitranka wrote:
  
 
  Otis, I understand that 1 solr instance can store n documents
  (one-by-one).
  My query was how to create m such instances/processes/threads so that m
  documents get stored at a time, instead of 1 at a time.
 
  All the instances should read at the same port.
 
 
  You can send a batch of m documents at a time in the same XML.
  
  -- 
  Regards,
  Shalin Shekhar Mangar.
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Multiple-Process-of-the-SAME-solr-instance-tp19533951p19546626.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field level security

2008-09-18 Thread Otis Gospodnetic
Hi,

I don't understand all the details, but I'll inline a few comments.

 

- Original Message 
 From: Geoff Hopson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 1:44:33 AM
 Subject: Field level security
 
 Hi,
 
 First post/question, so please be gentle :-)
 
 I am trying to put together a security model around fields in my
 index. My requirement is that a user may not have permission to view
 certain fields in the index when he does a search. For example, he may
 have permission to see the name and address, but not the occupation.
 Whereas a different user with different permissions will be able to
 search all 3 fields.

What exactly is restricted?  Viewing of specific fields in results, or 
searching in specific fields?
If it's the former, you could tell Solr which fields to return using 
%fl=field1,field2... 
If it's the latter, you could always write a custom SearchComponent that takes 
your custom userType or allowedFields parameter and constructs a query 
based on that.

 What is the best way to model this?
 
 My current stab at this has a document-level security level set (I
 have a field called security_default), and all fields have this
 default. If there are exceptions, I have a multiValued field called
 'security_exceptions' where I comma delimit the fild name and
 different access permission for that field. Eg I might have
 'occupation=Restricted' in that field.
 
 This falls over when I copyField fields into a text field for easier 
 searching.

Searching across multiple fields is pretty easy, too.  I'd stick to that, as 
that also lets you assign different weight to different fields.

Otis

 Has anyone else attempted to do this and are willing to share their ideas?
 
 Thanks in advance,
 Geoff



Re: Special character matching 'x' ?

2008-09-18 Thread Norberto Meijome
On Thu, 18 Sep 2008 10:53:39 +0530
Sanjay Suri [EMAIL PROTECTED] wrote:

 One of my field values has  the name R__ikk__nen  which contains a special
 characters.
 
 Strangely, as I see it anyway, it matches on the search query 'x' ?
 
 Can someone explain or point me to the solution/documentation?

hi Sanjay,
Akshay should have given you an answer for this. In a more general way, if you
want to know WHY something is matching the way it is, run the query with
debugQuery=true . There are a few pages in the wiki which explain other
debugging techniques.

b
_
{Beto|Norberto|Numard} Meijome

Ask not what's inside your head, but what your head's inside of.
   J. J. Gibson

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Field level security

2008-09-18 Thread Geoff Hopson
Hi Otis,
Thanks for the response. I'll try and inline some clarity...

2008/9/18 Otis Gospodnetic [EMAIL PROTECTED]:

 I am trying to put together a security model around fields in my
 index. My requirement is that a user may not have permission to view
 certain fields in the index when he does a search. For example, he may
 have permission to see the name and address, but not the occupation.
 Whereas a different user with different permissions will be able to
 search all 3 fields.

 What exactly is restricted?  Viewing of specific fields in results, or 
 searching in specific fields?

I am restricting the results - the user can search everything, but I
was planning (as you mention) to apply a fieldList qualifier to the
query. In my head (ie not tried it yet) I was hoping I could write a
'SecurityRequestHandler' that would take an incoming security 'token'
and construct a %fl qualifier.

Some other thoughts in my head are around developing my own fieldType,
where I could tokenise the value against the field (e.g. store field
name=occupationcandlestick maker=Restricted/field or something
similar. Thoughts on that?


 If it's the former, you could tell Solr which fields to return using 
 %fl=field1,field2...
 If it's the latter, you could always write a custom SearchComponent that 
 takes your custom userType or allowedFields parameter and constructs a 
 query based on that.

 What is the best way to model this?

 My current stab at this has a document-level security level set (I
 have a field called security_default), and all fields have this
 default. If there are exceptions, I have a multiValued field called
 'security_exceptions' where I comma delimit the fild name and
 different access permission for that field. Eg I might have
 'occupation=Restricted' in that field.

 This falls over when I copyField fields into a text field for easier 
 searching.

 Searching across multiple fields is pretty easy, too.  I'd stick to that, as 
 that also lets you assign different weight to different fields.


My requirement is to offer a google-type search, so the user can type
in john smith ford green and get results where ford may be a last
name or a car manufacturer, or green is the colour of the car, a
last name or part of a town name. If I tokenised the field values as
above and copyField-ed them into a single text box, would my tokeniser
pick those out?

Dunno - I guess I need to roll my sleeves up and do some coding, try
some of this out.

Thanks again for any insights

Geoff


Re: Solr vs Autonomy

2008-09-18 Thread Geoff Hopson
As per other thread

1) security down to field level

Otherwise I am mostly happy that Solr gives me everything that Autonomy does.

2008/9/18 Otis Gospodnetic [EMAIL PROTECTED]:
 Geoff,

 Perhaps you can find out the list of features/functionalities that your 
 project requires and we can give you quick yes/no.
 Or perhaps you can get those others to list those Autonomy features that 
 they think they really need, and we can tell you how Solr compares.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Geoff Hopson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 1:46:33 AM
 Subject: Solr vs Autonomy

 Hi,

 I'm under pressure to justify the use of Solr on my project, and
 others are suggesting that Autonomy be used instead. Apart from price,
 does anyone have a list of pros/cons around Autonomy compared to Solr?

 Thanks
 geoff





-- 
Light travels faster than sound. This is why some people appear bright
until you hear them speak………
Mario Kart Wii: 2320 6406 5974


AW: Date field mystery

2008-09-18 Thread Kolodziej Christian
Hi Chris,

it was a long night for our solr server today because we rebuilt the complete 
index using well formed date string. And the date field is stored now so that 
we can see if there went something wrong :-)

But our problems are solved completely. Now I can give you a very exact 
description what is the problem now (and what was the reason that we used 
malformed date values).

Let's imagine we have 3 records with die following date values:
1. 2006-03-04T12:23:19Z
2. 2007-08-12T19:07:03Z
3. 2008-09-16T12:56:19Z

And now I will give you some queries and which results we get back:
- date:[2005-01-01T00:00:00Z TO NOW] or date:[2005-01-01T00:00:00Z TO 
2008-09-18T09:45:00Z]: 1 and 2 (incorrect)
- date:[2005-01-01T00:00:00Z TO 20080918T09:45:00Z]: 1, 2, 3 (correct)
- date:[2005-01-01T00:00:00Z TO 2007-12-31T23:59:59Z]: only 1 (incorrect)
- date:[2005-01-01T00:00:00Z TO 20071231T23:59:59Z]: 1 and 2 (correct)

So as you can see using - in the second parameter of the range query for the 
date field causes an error and doesn't find the record should has to be found, 
using a malformed date value without - return the correct records.

When using - for the second parameter all records that are from the year 
contained in the parameter aren't found any more. This behavior is reproducible 
on different systems, either CentOS or Debian. It must be a problem of solr or 
the Lucene (query parser) itself.

Our next steps are to test our scenario using solr 1.3 and if the problem isn't 
fix we will using timestamps instead for the date format. But maybe this is a 
general problem of solr and should be fixed because in other cases and for 
other users it's not possible to make a workaround and they get wrong 
(incomplete) results for their query.

Best regards,
Christian


Re: cron job update index

2008-09-18 Thread sunnyfr

Ok Thanks it's very clear.
Just do you know why my cron job doesn't work :

# m h  dom mon dow   command
*/5 * * * * /usr/bin/wget
http://solr-test.books.com:8080/solr/books/dataimport?command=delta-import

When I go to check the date in conf/dataimport.properties, the date and hour
doesn't change? so yesterday :
#Wed Sep 17 18:07:14 CEST 2008
last_index_time=2008-09-17 17\:24\:07

and weirdly if I run wget
http://solr-test.books.com:8080/solr/books/dataimport?command=delta-import
[EMAIL PROTECTED]:/# wget
http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import
--09:26:24-- 
http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import
   = `dataimport?command=delta-import.2'
Resolving solr-test.adm.books.com... 10.97.1.151
Connecting to solr-test.adm.books.com|10.97.1.151|:8180... connected.
HTTP request sent, awaiting response... Read error (Connection reset by
peer) in headers.
Retrying.

--09:27:51-- 
http://solr-test.adm.books.com:8180/solr/video/dataimport?command=delta-import
  (try: 2) = `dataimport?command=delta-import.2'
Connecting to solr-test.adm.books.com|10.97.1.151|:8180... connected.
HTTP request sent, awaiting response... 200 OK
Length: 807 [text/xml]

100%[==]
807   --.--K/s 
09:27:51 (174.91 MB/s) - `dataimport?command=delta-import.2' saved [807/807]

And my dataimport.properties doesn't change either.

But If i go through my browser it does change my dataimport.properties.








Shalin Shekhar Mangar wrote:
 
 On Wed, Sep 17, 2008 at 9:42 PM, sunnyfr wrote:
 

 Sorry but silly question about  Then the main query is
 executed for each primary key identified by the deltaQuery. This main
 query
 is used to create the documents and index them.

 I don't see in the code the link between the deltaQuery and and the main
 Query, how does it get back this ids which has been modified?

 
 After the delta query is executed and the changed IDs (PK) are collected.
 It
 modified the query using the pk attribute with a value in delta list
 and
 runs it to create the document.
 
 For example:
 entity pk=id query=select * from books deltaQuery=select id from
 books
 where modified   '${dataimport.last_index_time}'
 
 Suppose changed pk given by deltaQuery is [1,5,7] then the following
 queries
 are executed:
 select * from books where id = '1';
 select * from books where id = '5';
 select * from books where id = '7';
 
 
 
 ho just finished, it tooks actually a long time : so maybe my cron job
 has
 to be less often ?
 str name=
 Indexing completed. Added/Updated: 390796 documents. Deleted 0 documents.
 /str
 str name=Committed2008-09-17 18:07:47/str
 str name=Time taken 0:43:40.465/str
 
 
 Vary the cron job depending on how frequently and by how many documents
 the
 DB is updated. If an existing import is running, additional calls to start
 an import operation are ignored.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/cron-job-update-index-tp19520468p19547964.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Setting request method to post on SolrQuery causes ClassCastException

2008-09-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
I guess the post is not sending the correct 'wt' parameter. try
setting wt=javabin explicitly .

wt=xml may not work because the parser still is binary.

check this http://wiki.apache.org/solr/Solrj#xmlparser





On Thu, Sep 18, 2008 at 11:49 AM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 A quick work-around is, I think, to tell Solr to use the non-binary response, 
 e.g. wt=xml (I think that's the syntax).

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: syoung [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, September 17, 2008 7:27:30 PM
 Subject: Setting request method to post on SolrQuery causes 
 ClassCastException


 Hi,

 I need to have queries over a certain length done as a post instead of a
 get.  However, when I set the method to post, I get a ClassCastException.
 Here is the code:

 public QueryResponse query(SolrQuery solrQuery) {
 QueryResponse response = null;
 try {
 if (solrQuery.toString().length()  MAX_URL_LENGTH)
 response = server.query(solrQuery, SolrRequest.METHOD.POST);
 else
 response = server.query(solrQuery, SolrRequest.METHOD.GET);
 } catch (SolrServerException e) {
 throw new DataAccessResourceFailureException(e.getMessage(), e);
 }
 return response;
 }

 And the stack trace:

 java.lang.ClassCastException: java.lang.String
 org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)
 org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113)
 com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33)

 Thanks,

 Susan


 --
 View this message in context:
 http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Re: recip(myfield,m,a,b)

2008-09-18 Thread sunnyfr

I don't think it can works at the index time, because I when somebody look
for a book I want to boost the search in relation with the user language
...so I dont think it can works, except if I didn't get it.

Thanks for your answer,


hossman wrote:
 
 
 : Is there a way to convert to integer to check if a = b ... like
 : recip(myfield,m,language,lang)
 : But I would like to boost(scoring) field which have the same user
 language
 : and book language ...
 : 
 : But for that I need to know convert.int(language)
 
 There is an OrdFieldSource that can be used with single-valued string 
 fields to get a 
 numeric value for where they are in the order of all values for that 
 field ... it is in fact what get's use by default when you include a 
 string fieldname in the functionquery syntax.
 
 But off the top of my head i don't think there are any Functions provided 
 by default that let you compare two ValueSources and return one number if 
 they are equal and another number if they aren't.
 
 Frankly: the best way to approach a problem like this is to set a boolean 
 field at index time if the other two fields are the same.
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/recip%28myfield%2Cm%2Ca%2Cb%29-tp19452492p19549282.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Special character matching 'x' ?

2008-09-18 Thread Sanjay Suri
Thanks Akshay and Norberto,
I am still trying to make it work. I know the solution is what you pointed
me to but is just taking me some time to make it work.

thanks,
-Sanjay

On Thu, Sep 18, 2008 at 12:34 PM, Norberto Meijome [EMAIL PROTECTED]wrote:

 On Thu, 18 Sep 2008 10:53:39 +0530
 Sanjay Suri [EMAIL PROTECTED] wrote:

  One of my field values has  the name R__ikk__nen  which contains a
 special
  characters.
 
  Strangely, as I see it anyway, it matches on the search query 'x' ?
 
  Can someone explain or point me to the solution/documentation?

 hi Sanjay,
 Akshay should have given you an answer for this. In a more general way, if
 you
 want to know WHY something is matching the way it is, run the query with
 debugQuery=true . There are a few pages in the wiki which explain other
 debugging techniques.

 b
 _
 {Beto|Norberto|Numard} Meijome

 Ask not what's inside your head, but what your head's inside of.
   J. J. Gibson

 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet.
 Reading disclaimers makes you go blind. Writing them is worse. You have
 been
 Warned.




-- 
Sanjay Suri

Videocrux Inc.
http://videocrux.com
+91 99102 66626


Unable to filter fq param on a dynamic field

2008-09-18 Thread Barry Harding


Hi,

I have a fairly simple solr setup with several predefined fields that are 
indexed and stored and also depending on the type of product I also add various 
dynamic fields of type string to a record, and I should mention that I am using 
the
solr.DisMaxRequestHandler request handler called /IvolutionSearch in my 
example requests.



My Schema is as follows:

field name=CampaignCode type=string indexed=true stored=true 
required=true /

field name=CategoryName type=string indexed=true stored=true 
required=true /

field name=CategoryPath type=string indexed=true stored=true 
required=true /

field name=CountryCode type=string indexed=true stored=true 
required=true /

field name=Id type=string indexed=true stored=true required=true /

field name=ManufacturerName type=string indexed=true stored=true 
required=true /

field name=MPN type=textTight indexed=true stored=true required=true 
/

field name=ProductName type=text indexed=true stored=true 
required=true /

field name=Overview type=text indexed=true stored=true 
required=false /

field name=Price type=float indexed=true stored=true required=true /

field name=ProductCode type=textTight indexed=true stored=true 
required=true /

field name=ReviewRating type=integer indexed=true stored=true 
required=false /

field name=StockCode type=string indexed=true stored=true 
required=false /

field name=TaxCode type=string indexed=false stored=true 
required=true /

field name=ThumbnailURI type=string indexed=false stored=true 
required=false /

field name=WebClassification type=textTight indexed=true stored=true 
required=true /



dynamicField name=*-facet type=string indexed=true stored=true 
multiValued=false /



Now I can query for any of the fixed field types Such as ProductName or 
ReviewRating and get the results I expect but when I try to run a filter query 
on the dynamic fields in the result, I always end up with no results being 
returned.



So if I run the following query against my copy of solr 1.3 I get the results I 
am expecting



http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100



- result name=response numFound=1216 start=0
- doc
  str name=CampaignCode$A/str
  str name=CategoryNameMono Laser Printers/str
  str name=CategoryPathPrinters|Mono Laser Printers/str
  str name=Connectivity-Technology-facetWired/str
  str name=CountryCodeUK/str
  str name=IdUK$AQ969719/str
  str name=MPN3500V_DN/str
  str name=Manufacturer-facetXerox/str
  str name=ManufacturerNameXerox/str
  str name=Output-Type-facetMonochrome/str
  str name=OverviewThe Xerox Phaser 3500 series printer provides an 
affordable solution to meet the increasing volume a/str
  float name=Price464.10/float
  str name=ProductCodeQ969719/str
  str name=ProductNameXEROX 3500DN MONO LASER/str
  int name=ReviewRating /
  str name=StockCodeE000/str
  str name=TaxCode2/str
  str name=Technology-facetLaser/str
  str name=ThumbnailURI26099.jpg/str
  str name=Type-facetWorkgroup printer/str
  str name=WebClassificationMLASERPRN/str
  date name=timestamp2008-09-17T17:10:44.37Z/date
  /doc
- doc
  str name=CampaignCode$B/str
  str name=CategoryNameMono Laser Printers/str
  str name=CategoryPathPrinters|Mono Laser Printers/str
  str name=Connectivity-Technology-facetWired/str

and so on for the 100 results

no if I try to filter those results to just those that contain 
Output-Type-facet equaling Monochrome

using :
 
http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome
or

http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome;
or
http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome;
or
http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome;



I just get zero results back even though I know that filed contains that value, 
please before I pull my hair out tell me what mistake I have made, why can I 
query using a static field and not a dynamic field

any help even if its to say I have been stupid or to tell me to reread a 
section of the manual/Wiki because I did not get the point much appreciated.



Thanks

Barry Hmailto:[EMAIL PROTECTED]










Misco is a division of Systemax Europe Ltd.  Registered in Scotland Number 
114143.  Registered Office: Caledonian Exchange, 19a Canning Street, Edinburgh 
EH3 8EG.  Telephone +44 (0)1933 686000.

Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
Hi Yonik,

One approach I have been working on that I will integrate into SOLR is
the ability to use serialized objects for the analyzers so that the
schema can be defined on the client side if need be.  The analyzer
classes will be dynamically loaded.  Or there is no need for a schema
and plain Java objects can be defined and used.

I'd like to see the synonyms serialized as well.  When I mentioned the
serialization it is in regards to setting the configuration over the
Hadoop RMI LUCENE-1336 protocol.  Instead of creating methods for each
new call one wants, the easiest approach in distributed computing is
to have a dynamic class loaded that operates directly on SolrCore and
so can do whatever is necessary to get the work completed.  Creating
new methods in distributed computing is always a bad idea IMO.

In realtime indexing one will not be able to simply reindex all the
time, and so either a dynamic schema, or no schema at all is best.
Otherwise the documents would need to have a schemaVersion field, this
gets messy I looked at this.

Jason

On Wed, Sep 17, 2008 at 5:10 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Wed, Sep 17, 2008 at 4:50 PM, Henrib [EMAIL PROTECTED] wrote:
 Yonik Seeley wrote:

 ...multi-core allows you to instantiate a completely
 new core and swap it for the old one, but it's a bit of a heavyweight
 approach
 ...a schema object would not be mutable, but
 that one could easily swap in a new schema object for an index at any
 time...


 Not sure I understand what we gain; if you change the schema, you'll most
 likely will
 have to reindex as well.

 That's management at a higher level in a way.
 There are enough ways that one could change the schema in a compatible
 way (say like just adding query-time synonyms, etc) that it does seem
 like we should permit it.

 Or are you saying we should have a shortcut for the
 whole operation of
 creating a new core, reindex content, replacing an existing core ?

 Eventually, it seems like we should be able to handle re-indexing when
 necessary.
 And we should consider the ability to change some config without
 necessarily reloading *everything*.

 -Yonik



Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
This should be done.  Great idea.

On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:
 My vote is for dynamically scanning a directory of configuration files. When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:
 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.

 Exactly.  Actually, multi-core allows you to instantiate a completely new
 core and swap it for the old one, but it's a bit of a heavyweight approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but that
 one could easily swap in a new schema object for an index at any time.  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.

 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.

 That's the plan... completely separate the serialized and in memory
 representations.

 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.

 Nothing will stop one from using java serialization for config persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to explain
 things, etc.

 Of course, if you are talking about being able to have custom filters or
 analyzers (new classes that don't even exist on the server yet), then it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that could
 also automatically be handled in a a large cluster... what are the options
 for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema along
 with a new jar file containing the custom components.

 -Yonik




Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
 That would allow a single request to see a stable view of the
 schema, while preventing having to make every aspect of the schema
 thread-safe.

Yes that is the best approach.

 Nothing will stop one from using java serialization for config
 persistence,

Persistence should not be serialized.  Serialization is for transport
over the wire for automated upgrades of the configuration.  This could
be done in XML as well, but it would be good to support both models.

 Is there a role here for OSGi to play?

Yes.  Eclipse successfully uses OSGI, and for grid computing in Java,
and to take advantage of what Java can do with dynamic classloading,
OSGI is the way to go.  Every search project I have worked on needs
this stuff to be way easier than it is now.  The current distributed
computing model in SOLR may work, but it will not work reliably and
will break a lot.  When it does break there is no way to know what
happened.  This will create excessive downtime for users.  I have had
excessive downtime in production even in the current simple
master-slave architecture because there is no failover.  Failover in
the current system should be in there because it's too easy to
implement with the rsync based batch replication.

On Wed, Sep 17, 2008 at 2:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:
 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.

 Exactly.  Actually, multi-core allows you to instantiate a completely
 new core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but
 that one could easily swap in a new schema object for an index at any
 time.  That would allow a single request to see a stable view of the
 schema, while preventing having to make every aspect of the schema
 thread-safe.

 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.

 That's the plan... completely separate the serialized and in memory
 representations.

 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.

 Nothing will stop one from using java serialization for config
 persistence, however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can
 cut-n-paste relevant parts of their config in email for support, or to
 a wiki to explain things, etc.

 Of course, if you are talking about being able to have custom filters
 or analyzers (new classes that don't even exist on the server yet),
 then it does start to get interesting.  This intersects with
 deployment in general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that
 could also automatically be handled in a a large cluster... what are
 the options for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema
 along with a new jar file containing the custom components.

 -Yonik



Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
Servlets is one thing.  For SOLR the situation is different.  There
are always small changes people want to make, a new stop word, a small
tweak to an analyzer.  Rebooting the server for these should not be
necessary.  Ideally this is handled via a centralized console and
deployed over the network (using RMI or XML) so that files do not need
to be deployed.

On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote:
 Isnt this done in servlet containers for debugging type work? Maybe an
 option, but I disagree that this should drive anything in solr. It should
 really be turned off in production in servelet containers imo as well.

 This can really be such a pain in the ass on a live site...someone touches
 web.xml and the app server reboots*shudder*. Seen it, don't dig it.

 Jason Rutherglen wrote:

 This should be done.  Great idea.

 On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:


 My vote is for dynamically scanning a directory of configuration files.
 When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for
 servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:


 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.


 Exactly.  Actually, multi-core allows you to instantiate a completely new
 core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but that
 one could easily swap in a new schema object for an index at any time.
  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.



 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.


 That's the plan... completely separate the serialized and in memory
 representations.



 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.


 Nothing will stop one from using java serialization for config
 persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to
 explain
 things, etc.

 Of course, if you are talking about being able to have custom filters or
 analyzers (new classes that don't even exist on the server yet), then it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that could
 also automatically be handled in a a large cluster... what are the
 options
 for handling that?  Is there a role here for OSGi to play?
  It sounds like at least some of that is outside of the Solr domain.

 An alternative to serializing everything would be to ship a new schema
 along
 with a new jar file containing the custom components.

 -Yonik







Re: Some new SOLR features

2008-09-18 Thread Mark Miller
Dynamic changes are not what I'm against...I'm against dynamic changes 
that are triggered by the app noticing that the config have changed.


Jason Rutherglen wrote:

Servlets is one thing.  For SOLR the situation is different.  There
are always small changes people want to make, a new stop word, a small
tweak to an analyzer.  Rebooting the server for these should not be
necessary.  Ideally this is handled via a centralized console and
deployed over the network (using RMI or XML) so that files do not need
to be deployed.

On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED] wrote:
  

Isnt this done in servlet containers for debugging type work? Maybe an
option, but I disagree that this should drive anything in solr. It should
really be turned off in production in servelet containers imo as well.

This can really be such a pain in the ass on a live site...someone touches
web.xml and the app server reboots*shudder*. Seen it, don't dig it.

Jason Rutherglen wrote:


This should be done.  Great idea.

On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED] wrote:

  

My vote is for dynamically scanning a directory of configuration files.
When
a new one appears, or an existing file is touched, load it. When a
configuration disappears, unload it.  This model works very well for
servlet
containers.

Lance

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, September 17, 2008 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Some new SOLR features

On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:



If the configuration code is going to be rewritten then I would like
to see the ability to dynamically update the configuration and schema
without needing to reboot the server.

  

Exactly.  Actually, multi-core allows you to instantiate a completely new
core and swap it for the old one, but it's a bit of a heavyweight
approach.

The key is finding the right granularity of change.
My current thought is that a schema object would not be mutable, but that
one could easily swap in a new schema object for an index at any time.
 That
would allow a single request to see a stable view of the schema, while
preventing having to make every aspect of the schema thread-safe.




Also I would like the
configuration classes to just contain data and not have so many
methods that operate on the filesystem.

  

That's the plan... completely separate the serialized and in memory
representations.




This way the configuration
object can be serialized, and loaded by the server dynamically.  It
would be great for the schema to work the same way.

  

Nothing will stop one from using java serialization for config
persistence,
however I am a fan of human readable for config files...
so much easier to debug and support.  Right now, people can cut-n-paste
relevant parts of their config in email for support, or to a wiki to
explain
things, etc.

Of course, if you are talking about being able to have custom filters or
analyzers (new classes that don't even exist on the server yet), then it
does start to get interesting.  This intersects with deployment in
general... and I'm not sure what the right answer is.
What if Lucene or Solr needs an upgrade?  It would be nice if that could
also automatically be handled in a a large cluster... what are the
options
for handling that?  Is there a role here for OSGi to play?
 It sounds like at least some of that is outside of the Solr domain.

An alternative to serializing everything would be to ship a new schema
along
with a new jar file containing the custom components.

-Yonik









delt-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread sunnyfr


This XML file does not appear to have any style information
associated with it. The document tree is shown below.
  
−
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configdata-config.xml/str
/lst
/lst
str name=statusidle/str
str name=importResponse/
−
lst name=statusMessages
str name=Time Elapsed4:26:16.934/str
str name=Total Requests made to DataSource3451431/str
str name=Total Rows Fetched9165885/str
str name=Total Documents Processed493061/str
str name=Total Documents Skipped0/str
str name=Delta Dump started2008-09-18 10:01:01/str
str name=Identifying Delta2008-09-18 10:01:01/str
str name=Deltas Obtained2008-09-18 10:01:43/str
str name=Building documents2008-09-18 10:01:43/str
str name=Total Changed Documents1587889/str
/lst
−
str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
-- 
View this message in context: 
http://www.nabble.com/delt-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19551728.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Some new SOLR features

2008-09-18 Thread Jason Rutherglen
Yes, so it's probably best to make the changes through a remote
interface so that the app will be able to make the appropriate
internal changes.  File based system changes are less than ideal,
agreed, however I suppose with an open source project such as SOLR the
kitchen sink affect happens and it will find it's way in there
anyways.  The hard part is organizing the project such that it does
not get too bloated with everyone's features and allows features to be
pluggable outside of the core releases.  There are many things that
may best best as contrib modules that could be OSGI based add ons
rather than placed into the standard releases (of which I don't have
any off hand).  The standard for contribs for SOLR can be OSGI.  This
will greatly assist in SOLR becoming grid computing friendly.  Ideally
SOLR 2.0 would be cleaner, standardized, and most of the features
pluggable.  This will allow for consistent release cycles, make grid
computing simpler to implement.  SOLR seems like it could be going in
the direction of bloat which could increasingly confuse new users.
Instead they could either implement their own modules and upload them
in the contrib section, implement their own that are proprietary.

I am curious about what is the recommended place to put the query
expansion code (such as adding boosting, adding phrase queries and
such)?  Is is now best to use a SearchComponent?  Is it possible in
the future to make SearchComponents OSGI enabled?

On Thu, Sep 18, 2008 at 7:56 AM, Mark Miller [EMAIL PROTECTED] wrote:
 Dynamic changes are not what I'm against...I'm against dynamic changes that
 are triggered by the app noticing that the config have changed.

 Jason Rutherglen wrote:

 Servlets is one thing.  For SOLR the situation is different.  There
 are always small changes people want to make, a new stop word, a small
 tweak to an analyzer.  Rebooting the server for these should not be
 necessary.  Ideally this is handled via a centralized console and
 deployed over the network (using RMI or XML) so that files do not need
 to be deployed.

 On Thu, Sep 18, 2008 at 7:41 AM, Mark Miller [EMAIL PROTECTED]
 wrote:


 Isnt this done in servlet containers for debugging type work? Maybe an
 option, but I disagree that this should drive anything in solr. It should
 really be turned off in production in servelet containers imo as well.

 This can really be such a pain in the ass on a live site...someone
 touches
 web.xml and the app server reboots*shudder*. Seen it, don't dig it.

 Jason Rutherglen wrote:


 This should be done.  Great idea.

 On Wed, Sep 17, 2008 at 3:41 PM, Lance Norskog [EMAIL PROTECTED]
 wrote:



 My vote is for dynamically scanning a directory of configuration files.
 When
 a new one appears, or an existing file is touched, load it. When a
 configuration disappears, unload it.  This model works very well for
 servlet
 containers.

 Lance

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
 Seeley
 Sent: Wednesday, September 17, 2008 11:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Some new SOLR features

 On Wed, Sep 17, 2008 at 1:27 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:



 If the configuration code is going to be rewritten then I would like
 to see the ability to dynamically update the configuration and schema
 without needing to reboot the server.



 Exactly.  Actually, multi-core allows you to instantiate a completely
 new
 core and swap it for the old one, but it's a bit of a heavyweight
 approach.

 The key is finding the right granularity of change.
 My current thought is that a schema object would not be mutable, but
 that
 one could easily swap in a new schema object for an index at any time.
  That
 would allow a single request to see a stable view of the schema, while
 preventing having to make every aspect of the schema thread-safe.




 Also I would like the
 configuration classes to just contain data and not have so many
 methods that operate on the filesystem.



 That's the plan... completely separate the serialized and in memory
 representations.




 This way the configuration
 object can be serialized, and loaded by the server dynamically.  It
 would be great for the schema to work the same way.



 Nothing will stop one from using java serialization for config
 persistence,
 however I am a fan of human readable for config files...
 so much easier to debug and support.  Right now, people can cut-n-paste
 relevant parts of their config in email for support, or to a wiki to
 explain
 things, etc.

 Of course, if you are talking about being able to have custom filters
 or
 analyzers (new classes that don't even exist on the server yet), then
 it
 does start to get interesting.  This intersects with deployment in
 general... and I'm not sure what the right answer is.
 What if Lucene or Solr needs an upgrade?  It would be nice if that
 could
 also automatically be handled in a a large cluster... what are the
 

Re: problem index accented character with release version of solr 1.3

2008-09-18 Thread Sean Timm
From the XML 1.0 spec.: Legal characters are tab, carriage return, 
line feed, and the legal graphic characters of Unicode and ISO/IEC 
10646.  So, \005 is not a legal XML character.  It appears the old StAX 
implementation was more lenient than it should have been and Woodstox is 
doing the correct thing.


-Sean

Ryan McKinley wrote:
My guess is it has to do with switching the StAX implementation to 
geronimo API and the woodstox implementation


https://issues.apache.org/jira/browse/SOLR-770

I'm not sure what the solution is though...


On Sep 17, 2008, at 10:02 PM, Joshua Reedy wrote:


I have been using a stable dev version of 1.3 for a few months.
Today, I began testing the final release version, and I encountered a
strange problem.
The only thing that has changed in my setup is the solr code (I didn't
make any config change or change the schema).

a document has a text field with a value that contains:
Andr\005é 3000

Indexing the document by itself or as part of a batch, produces the
following error:
Sep 17, 2008 5:00:27 PM org.apache.solr.common.SolrException log
SEVERE: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 5))
at [row,col {unknown-source}]: [5,205]
   at 
com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
   at 
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) 

   at 
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) 

   at 
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) 

   at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) 

   at 
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:327) 

   at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:195) 

   at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) 

   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 


   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) 

   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) 

   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) 

   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 

   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) 

   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) 

   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 

   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 

   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) 

   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) 

   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) 

   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) 

   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

   at java.lang.Thread.run(Thread.java:595)

The latest version of the solr doesn't seem to like control characters
(\005, in this case), but previous versions handled them (or at least
ignored them).

These characters shouldn't be in my documents, so there's a bug on my
end to track down.  However, I'm wondering if this was an expected
change or an unintended consequence of recent work . . .




--
- 


Be who you are and say what you feel,
because those who mind don't matter and
those who matter don't mind.
-- Dr. Seuss


Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread sunnyfr

It was too long so I finally restart tomcat .. then 5mn later my cron job
started :
but it looks like nothing happening by cron job : 

This is my OUTPUT file : tot.txt

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime0/int/lstlst name=initArgslst name=defaultsstr
name=configdata-config.xml/str/lst/lststr
name=commanddelta-import,/strstr name=statusidle/strstr
name=importResponse/lst name=statusMessages/str name=WARNINGThis
response format is experimental.  It is likely to change in the
future./str
/response


This is my CRON JOB WGET

*/5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt
http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import,
echo $(date)  /home/tot.txt




sunnyfr wrote:
 
 This XML file does not appear to have any style information
 associated with it. The document tree is shown below.
   
 −
 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 −
 lst name=initArgs
 −
 lst name=defaults
 str name=configdata-config.xml/str
 /lst
 /lst
 str name=statusidle/str
 str name=importResponse/
 −
 lst name=statusMessages
 str name=Time Elapsed4:26:16.934/str
 str name=Total Requests made to DataSource3451431/str
 str name=Total Rows Fetched9165885/str
 str name=Total Documents Processed493061/str
 str name=Total Documents Skipped0/str
 str name=Delta Dump started2008-09-18 10:01:01/str
 str name=Identifying Delta2008-09-18 10:01:01/str
 str name=Deltas Obtained2008-09-18 10:01:43/str
 str name=Building documents2008-09-18 10:01:43/str
 str name=Total Changed Documents1587889/str
 /lst
 −
 str name=WARNING
 This response format is experimental.  It is likely to change in the
 future.
 /str
 

-- 
View this message in context: 
http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr vs Autonomy

2008-09-18 Thread Walter Underwood
It depends entirely on the needs of the project. For some things,
Solr is superior to Autonomy, for other things, not.

I used to work at Autonomy (and Verity and Inktomi and Infoseek),
and I chose Solr for Netflix. It is working great for us.

wunder
==
Walter Underwood
Former Ultraseek Architect
Current Netflix Search Lead

On 9/17/08 10:46 PM, Geoff Hopson [EMAIL PROTECTED] wrote:

 Hi,
 
 I'm under pressure to justify the use of Solr on my project, and
 others are suggesting that Autonomy be used instead. Apart from price,
 does anyone have a list of pros/cons around Autonomy compared to Solr?
 
 Thanks
 geoff



Re: Solr vs Autonomy

2008-09-18 Thread Geoff Hopson
My project is looking to index 10s of millions of documents, providing
search across a live-live environment (hence index
distribution/replication is important). Most searches have to be done
(ie to end user) in 5 seconds or less. The index has about 30 fields,
and I reckon that the security access I alluded to can be solved with
field-specific queries (as opposed to a single copyFielded text
field).

The searches are very simple, but need to be quick. The confidence
in the information is important, and so scoring is value. Faceted
searches have a place too.

Autonomy seems to have a solid security/access control model but
offers nothing above and beyond Solr, unless I am missing something.

Dunno if that helps?

Geoff

2008/9/18 Walter Underwood [EMAIL PROTECTED]:
 It depends entirely on the needs of the project. For some things,
 Solr is superior to Autonomy, for other things, not.

 I used to work at Autonomy (and Verity and Inktomi and Infoseek),
 and I chose Solr for Netflix. It is working great for us.

 wunder
 ==
 Walter Underwood
 Former Ultraseek Architect
 Current Netflix Search Lead

 On 9/17/08 10:46 PM, Geoff Hopson [EMAIL PROTECTED] wrote:

 Hi,

 I'm under pressure to justify the use of Solr on my project, and
 others are suggesting that Autonomy be used instead. Apart from price,
 does anyone have a list of pros/cons around Autonomy compared to Solr?

 Thanks
 geoff





-- 
Light travels faster than sound. This is why some people appear bright
until you hear them speak………
Mario Kart Wii: 2320 6406 5974


Re: No server response code on insert: how do I avoid this at high speed?

2008-09-18 Thread Paleo Tek

Otis Gospodnetic wrote:

Perhaps the container logs explain what happened?
How about just throttling to the point where the failure rate is 0%?  
Too slow?
  


Otis's questions regarding dropped inserts sent me back to the drawing 
board.  The system had been tuned to a slower database to optimize speed 
and accept a few drops.  When I migrated to a faster DB I didn't 
retune.  Here are results of testing indexing performance for Tomcat and 
Jetty.   The DB speedup apparently moved the bottleneck from getting 
records from the database (around 400 rps) to cramming records into the 
servlet container.



System:   16 processor, 2.5 ghz, 64G memory
Index:  33 Gig, freshly optimized, avg recordsize 1.4k
Insert load: 250,000 records

I calculate records/sec by dividing the number of successful inserts by 
the time.  The adjusted time is the estimated time it would take to 
insert the full 250,000 records with no errors, which is raw time plus 
the additional time required to insert those dropped records, ie, raw 
time * (1 + error-rate * 0.01).  Judging from processor/memory/io 
utilization, it appears the write speed of a single java thread is 
dominating the solr indexing speed.  Which makes sense.



Takehome lessons:

  The speed limit is about 450 records per second in our environment.

  Three or four threads posting inserts max out speed.  More threads 
don't help.


  Jetty is significantly faster than Tomcat at sane thread counts in 
our environment


I hope this is useful. 


  -Jim

PS:  If you have formatting issues with this table, try viewing with a 
fixed width font
 

 
Tomcat
Jetty
  
_



# threads Raw time   # Drops% Error Records/sec  
Adj. time  Raw time  # Drops   % Error   Records/sec   Adj. time
16533 171316.85  
436.9 569.51 594 24222 9.69  380.1 651.55
15520 168786.75  448.31  
 555.1  518 28581 11.43 427.45577.22
14547 163786.55  
427.1 582.83 496 30047 12.02 443.45555.61
13540 166386.65  
432.15575.91 495 27076 10.83 450.35548.61
12545 159206.36  
429.5 579.66 494 28785 11.51 447.8 550.88
11523 161926.47  447.05  
 556.84 484 26495 10.6  461.79535.29
10540 156436.26  433.99  
 573.8  497 27190 10.88 448.31551.05
9553 155436.21  
423.97587.34 494 25862 10.34 453.72545.1
8541 140955.64  
436.05571.51 501 23482  9.39 452.13548.06
7549 107354.29  435.82  
 572.55 499 24657  9.86 451.59548.22
6566  94683.79  
424.97587.45 502 23074  9.23 452.04548.33
5588  77543.10  
411.98606.23 527 20779  8.31 434.95570.8
4577  42011.68  
425.99586.69 513 16608  6.64 454.96547.08
3613 0   0  
407.83613537  9503  3.8  447.85557.41
2801 0   0  
312.11801633 00  394.94633
1   1365 0   0  
183.15   1365   1122 00  222.82   1122





Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread Shalin Shekhar Mangar
Hit /dataimport again from a browser and refresh periodically to see the
progress (number of documents indexed).

On Thu, Sep 18, 2008 at 7:55 PM, sunnyfr [EMAIL PROTECTED] wrote:


 It was too long so I finally restart tomcat .. then 5mn later my cron job
 started :
 but it looks like nothing happening by cron job :

 This is my OUTPUT file : tot.txt

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime0/int/lstlst name=initArgslst name=defaultsstr
 name=configdata-config.xml/str/lst/lststr
 name=commanddelta-import,/strstr name=statusidle/strstr
 name=importResponse/lst name=statusMessages/str name=WARNINGThis
 response format is experimental.  It is likely to change in the
 future./str
 /response


 This is my CRON JOB WGET

 */5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt

 http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import
 ,
 echo $(date)  /home/tot.txt




 sunnyfr wrote:
 
  This XML file does not appear to have any style information
  associated with it. The document tree is shown below.
 
  −
  response
  −
  lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  /lst
  −
  lst name=initArgs
  −
  lst name=defaults
  str name=configdata-config.xml/str
  /lst
  /lst
  str name=statusidle/str
  str name=importResponse/
  −
  lst name=statusMessages
  str name=Time Elapsed4:26:16.934/str
  str name=Total Requests made to DataSource3451431/str
  str name=Total Rows Fetched9165885/str
  str name=Total Documents Processed493061/str
  str name=Total Documents Skipped0/str
  str name=Delta Dump started2008-09-18 10:01:01/str
  str name=Identifying Delta2008-09-18 10:01:01/str
  str name=Deltas Obtained2008-09-18 10:01:43/str
  str name=Building documents2008-09-18 10:01:43/str
  str name=Total Changed Documents1587889/str
  /lst
  −
  str name=WARNING
  This response format is experimental.  It is likely to change in the
  future.
  /str
 

 --
 View this message in context:
 http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread sunnyfr

It is exactly what I've done  but it can't works like that ... 

- what would that mean ... cron job can't hit it properly ? 

- I've browse to /dataimport but it was like nothing was running so I
finally went back to /dataimport?command=delta-import and then to
/dataimport and I refresh it often ...indeed it works this way but it's not
would suit me ... and it take ages ... now I'm : 

str name=statusbusy/str
str name=importResponseA command is still running.../str
−
lst name=statusMessages
str name=Time Elapsed0:18:44.54/str
str name=Total Requests made to DataSource1855793/str
str name=Total Rows Fetched5588946/str
str name=Total Documents Processed265113/str
str name=Total Documents Skipped0/str
str name=Delta Dump started2008-09-18 16:29:38/str
str name=Identifying Delta2008-09-18 16:29:38/str
str name=Deltas Obtained2008-09-18 16:30:26/str
str name=Building documents2008-09-18 16:30:26/str
str name=Total Changed Documents1603970/str
/lst



Shalin Shekhar Mangar wrote:
 
 Hit /dataimport again from a browser and refresh periodically to see the
 progress (number of documents indexed).
 
 On Thu, Sep 18, 2008 at 7:55 PM, sunnyfr [EMAIL PROTECTED] wrote:
 

 It was too long so I finally restart tomcat .. then 5mn later my cron job
 started :
 but it looks like nothing happening by cron job :

 This is my OUTPUT file : tot.txt

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime0/int/lstlst name=initArgslst name=defaultsstr
 name=configdata-config.xml/str/lst/lststr
 name=commanddelta-import,/strstr name=statusidle/strstr
 name=importResponse/lst name=statusMessages/str
 name=WARNINGThis
 response format is experimental.  It is likely to change in the
 future./str
 /response


 This is my CRON JOB WGET

 */5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt

 http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import
 ,
 echo $(date)  /home/tot.txt




 sunnyfr wrote:
 
  This XML file does not appear to have any style information
  associated with it. The document tree is shown below.
 
  −
  response
  −
  lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  /lst
  −
  lst name=initArgs
  −
  lst name=defaults
  str name=configdata-config.xml/str
  /lst
  /lst
  str name=statusidle/str
  str name=importResponse/
  −
  lst name=statusMessages
  str name=Time Elapsed4:26:16.934/str
  str name=Total Requests made to DataSource3451431/str
  str name=Total Rows Fetched9165885/str
  str name=Total Documents Processed493061/str
  str name=Total Documents Skipped0/str
  str name=Delta Dump started2008-09-18 10:01:01/str
  str name=Identifying Delta2008-09-18 10:01:01/str
  str name=Deltas Obtained2008-09-18 10:01:43/str
  str name=Building documents2008-09-18 10:01:43/str
  str name=Total Changed Documents1587889/str
  /lst
  −
  str name=WARNING
  This response format is experimental.  It is likely to change in the
  future.
  /str
 

 --
 View this message in context:
 http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554770.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread sunnyfr

It is exactly what I've done  but it can't works like that ... 

- what would that mean ... cron job can't hit it properly ? 

- I've browse to /dataimport but it was like nothing was running so I
finally went back to /dataimport?command=delta-import and then to
/dataimport and I refresh it often ...indeed it works this way but it's not
would suit me ... and it take ages ... now I'm : 

str name=statusbusy/str
str name=importResponseA command is still running.../str
−
lst name=statusMessages
str name=Time Elapsed0:18:44.54/str
str name=Total Requests made to DataSource1855793/str
str name=Total Rows Fetched5588946/str
str name=Total Documents Processed265113/str
str name=Total Documents Skipped0/str
str name=Delta Dump started2008-09-18 16:29:38/str
str name=Identifying Delta2008-09-18 16:29:38/str
str name=Deltas Obtained2008-09-18 16:30:26/str
str name=Building documents2008-09-18 16:30:26/str
str name=Total Changed Documents1603970/str
/lst



Shalin Shekhar Mangar wrote:
 
 Hit /dataimport again from a browser and refresh periodically to see the
 progress (number of documents indexed).
 
 On Thu, Sep 18, 2008 at 7:55 PM, sunnyfr [EMAIL PROTECTED] wrote:
 

 It was too long so I finally restart tomcat .. then 5mn later my cron job
 started :
 but it looks like nothing happening by cron job :

 This is my OUTPUT file : tot.txt

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint
 name=QTime0/int/lstlst name=initArgslst name=defaultsstr
 name=configdata-config.xml/str/lst/lststr
 name=commanddelta-import,/strstr name=statusidle/strstr
 name=importResponse/lst name=statusMessages/str
 name=WARNINGThis
 response format is experimental.  It is likely to change in the
 future./str
 /response


 This is my CRON JOB WGET

 */5 * * * * /usr/bin/wget -q --output-document=/home/tot.txt

 http://solr-test.adm.books.com:8180/solr/books/dataimport?command=delta-import
 ,
 echo $(date)  /home/tot.txt




 sunnyfr wrote:
 
  This XML file does not appear to have any style information
  associated with it. The document tree is shown below.
 
  −
  response
  −
  lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
  /lst
  −
  lst name=initArgs
  −
  lst name=defaults
  str name=configdata-config.xml/str
  /lst
  /lst
  str name=statusidle/str
  str name=importResponse/
  −
  lst name=statusMessages
  str name=Time Elapsed4:26:16.934/str
  str name=Total Requests made to DataSource3451431/str
  str name=Total Rows Fetched9165885/str
  str name=Total Documents Processed493061/str
  str name=Total Documents Skipped0/str
  str name=Delta Dump started2008-09-18 10:01:01/str
  str name=Identifying Delta2008-09-18 10:01:01/str
  str name=Deltas Obtained2008-09-18 10:01:43/str
  str name=Building documents2008-09-18 10:01:43/str
  str name=Total Changed Documents1587889/str
  /lst
  −
  str name=WARNING
  This response format is experimental.  It is likely to change in the
  future.
  /str
 

 --
 View this message in context:
 http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554129.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19554788.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr vs Autonomy

2008-09-18 Thread Ryan McKinley


On Sep 18, 2008, at 3:23 AM, Geoff Hopson wrote:


As per other thread

1) security down to field level



how complex of a security model do you need?

Is each users field visibility totally distinct?  are there a few  
basic groups?


If you are willing to write (or hire someone to write) a custom  
SearchComponent, you can remove fields from a response for a given  
users.


ryan



Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread Shalin Shekhar Mangar
Well it shows the number of documents that have changed, you can't expect
1603970 documents to be indexed instantly.

On Thu, Sep 18, 2008 at 8:24 PM, sunnyfr [EMAIL PROTECTED] wrote:


 It is exactly what I've done  but it can't works like that ...

 - what would that mean ... cron job can't hit it properly ?

 - I've browse to /dataimport but it was like nothing was running so I
 finally went back to /dataimport?command=delta-import and then to
 /dataimport and I refresh it often ...indeed it works this way but it's not
 would suit me ... and it take ages ... now I'm :

 str name=statusbusy/str
 str name=importResponseA command is still running.../str
 −
 lst name=statusMessages
 str name=Time Elapsed0:18:44.54/str
 str name=Total Requests made to DataSource1855793/str
 str name=Total Rows Fetched5588946/str
 str name=Total Documents Processed265113/str
 str name=Total Documents Skipped0/str
 str name=Delta Dump started2008-09-18 16:29:38/str
 str name=Identifying Delta2008-09-18 16:29:38/str
 str name=Deltas Obtained2008-09-18 16:30:26/str
 str name=Building documents2008-09-18 16:30:26/str
 str name=Total Changed Documents1603970/str
 /lst



-- 
Regards,
Shalin Shekhar Mangar.


Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread sunnyfr

I agree about that but the last time 4hours later the number wasn't different
:
and if I check now, nothing changed : does it have to go across all the data
like full import, I thought it would bring back just ids which need to be
modify ...?

lst name=statusMessages
str name=Time Elapsed0:39:36.943/str
str name=Total Requests made to DataSource3447914/str
str name=Total Rows Fetched9054602/str
str name=Total Documents Processed492558/str
str name=Total Documents Skipped0/str
str name=Delta Dump started2008-09-18 16:29:38/str
str name=Identifying Delta2008-09-18 16:29:38/str
str name=Deltas Obtained2008-09-18 16:30:26/str
str name=Building documents2008-09-18 16:30:26/str
str name=Total Changed Documents1603970/str
/lst

look it was this morning : I just stopped it beacause it was too long ... 
It doesn't look logic:

lst name=statusMessages
str name=Time Elapsed6:9:0.256/str
str name=Total Requests made to DataSource3451431/str
str name=Total Rows Fetched9165885/str
str name=Total Documents Processed493061/str
str name=Total Documents Skipped0/str
str name=Delta Dump started2008-09-18 10:01:01/str
str name=Identifying Delta2008-09-18 10:01:01/str
str name=Deltas Obtained2008-09-18 10:01:43/str
str name=Building documents2008-09-18 10:01:43/str
str name=Total Changed Documents1587889/str
/lst

And do you think my cron job can't work ?


Shalin Shekhar Mangar wrote:
 
 Well it shows the number of documents that have changed, you can't expect
 1603970 documents to be indexed instantly.
 
 On Thu, Sep 18, 2008 at 8:24 PM, sunnyfr [EMAIL PROTECTED] wrote:
 

 It is exactly what I've done  but it can't works like that ...

 - what would that mean ... cron job can't hit it properly ?

 - I've browse to /dataimport but it was like nothing was running so I
 finally went back to /dataimport?command=delta-import and then to
 /dataimport and I refresh it often ...indeed it works this way but it's
 not
 would suit me ... and it take ages ... now I'm :

 str name=statusbusy/str
 str name=importResponseA command is still running.../str
 −
 lst name=statusMessages
 str name=Time Elapsed0:18:44.54/str
 str name=Total Requests made to DataSource1855793/str
 str name=Total Rows Fetched5588946/str
 str name=Total Documents Processed265113/str
 str name=Total Documents Skipped0/str
 str name=Delta Dump started2008-09-18 16:29:38/str
 str name=Identifying Delta2008-09-18 16:29:38/str
 str name=Deltas Obtained2008-09-18 16:30:26/str
 str name=Building documents2008-09-18 16:30:26/str
 str name=Total Changed Documents1603970/str
 /lst

 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19555223.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread Shalin Shekhar Mangar
On Thu, Sep 18, 2008 at 8:45 PM, sunnyfr [EMAIL PROTECTED] wrote:


 I agree about that but the last time 4hours later the number wasn't
 different
 :


Do you mean that the number doesn't change at all on refreshing the page?
Can you check the solr log file for exceptions?

I suspect that you may be running out of memory.



 and if I check now, nothing changed : does it have to go across all the
 data
 like full import, I thought it would bring back just ids which need to be
 modify ...?


No, it will bring back only those id if the deltaQuery is correct. Are you
sure you modified less number of rows in the DB?



 lst name=statusMessages
 str name=Time Elapsed0:39:36.943/str
 str name=Total Requests made to DataSource3447914/str
 str name=Total Rows Fetched9054602/str
 str name=Total Documents Processed492558/str
 str name=Total Documents Skipped0/str
 str name=Delta Dump started2008-09-18 16:29:38/str
 str name=Identifying Delta2008-09-18 16:29:38/str
 str name=Deltas Obtained2008-09-18 16:30:26/str
 str name=Building documents2008-09-18 16:30:26/str
 str name=Total Changed Documents1603970/str
 /lst

 look it was this morning : I just stopped it beacause it was too long ...
 It doesn't look logic:

 lst name=statusMessages
 str name=Time Elapsed6:9:0.256/str
 str name=Total Requests made to DataSource3451431/str
 str name=Total Rows Fetched9165885/str
 str name=Total Documents Processed493061/str
 str name=Total Documents Skipped0/str
 str name=Delta Dump started2008-09-18 10:01:01/str
 str name=Identifying Delta2008-09-18 10:01:01/str
 str name=Deltas Obtained2008-09-18 10:01:43/str
 str name=Building documents2008-09-18 10:01:43/str
 str name=Total Changed Documents1587889/str
 /lst

 And do you think my cron job can't work ?


Your wget command looks fine to me.




 --
 View this message in context:
 http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19555223.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr vs Autonomy

2008-09-18 Thread Walter Underwood
I would do the field visibility one layer up from the search engine.
That layer already knows about the user and can request the appropriate
fields. Or request them all (better HTTP caching) and only show the
appropriate ones.

As I understand your application, putting access control in Solr
doesn't make search faster or more accurate. Add a filter query
to requests to restrict to the allowed documents, and you are good.

I wouldn't worry too much about putting all the text in one field
for speed. I tried that and it does help, but it means that you
must rebuild the index when you need to change the mapping. I'm
keeping things in separate fields and searching them all at
query time (with boosts).

wunder

On 9/18/08 8:04 AM, Ryan McKinley [EMAIL PROTECTED] wrote:

 
 On Sep 18, 2008, at 3:23 AM, Geoff Hopson wrote:
 
 As per other thread
 
 1) security down to field level
 
 
 how complex of a security model do you need?
 
 Is each users field visibility totally distinct?  are there a few
 basic groups?
 
 If you are willing to write (or hire someone to write) a custom
 SearchComponent, you can remove fields from a response for a given
 users.
 
 ryan




Re: delta-import looks stuck ???? how can I check if it's done or not ?

2008-09-18 Thread sunnyfr

this is my log file : 

[EMAIL PROTECTED]:/home# tail -f /var/log/tomcat5.5/catalina.$(date
+%Y-%m-%d).log
Sep 18, 2008 5:25:02 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity books with URL:
jdbc:mysql://master-spare.vip.books.com/books
Sep 18, 2008 5:25:02 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 50
Sep 18, 2008 5:25:53 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: books rows obtained : 1608415
Sep 18, 2008 5:25:53 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running DeletedRowKey() for Entity: books
Sep 18, 2008 5:25:53 PM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: books rows obtained : 0

I just updated browser /dataimport, looks like another one has been started
so maybe the cron started it :

str name=statusbusy/str
str name=importResponseA command is still running.../str
−
lst name=statusMessages
str name=Time Elapsed0:0:44.993/str
str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched1503980/str
str name=Total Documents Processed0/str
str name=Total Documents Skipped0/str
str name=Delta Dump started2008-09-18 17:30:01/str
str name=Identifying Delta2008-09-18 17:30:01/str
/lst



Shalin Shekhar Mangar wrote:
 
 On Thu, Sep 18, 2008 at 8:45 PM, sunnyfr [EMAIL PROTECTED] wrote:
 

 I agree about that but the last time 4hours later the number wasn't
 different
 :
 
 
 Do you mean that the number doesn't change at all on refreshing the page?
 Can you check the solr log file for exceptions?
 
 I suspect that you may be running out of memory.
 
 

 and if I check now, nothing changed : does it have to go across all the
 data
 like full import, I thought it would bring back just ids which need to be
 modify ...?

 
 No, it will bring back only those id if the deltaQuery is correct. Are you
 sure you modified less number of rows in the DB?
 
 

 lst name=statusMessages
 str name=Time Elapsed0:39:36.943/str
 str name=Total Requests made to DataSource3447914/str
 str name=Total Rows Fetched9054602/str
 str name=Total Documents Processed492558/str
 str name=Total Documents Skipped0/str
 str name=Delta Dump started2008-09-18 16:29:38/str
 str name=Identifying Delta2008-09-18 16:29:38/str
 str name=Deltas Obtained2008-09-18 16:30:26/str
 str name=Building documents2008-09-18 16:30:26/str
 str name=Total Changed Documents1603970/str
 /lst

 look it was this morning : I just stopped it beacause it was too long ...
 It doesn't look logic:

 lst name=statusMessages
 str name=Time Elapsed6:9:0.256/str
 str name=Total Requests made to DataSource3451431/str
 str name=Total Rows Fetched9165885/str
 str name=Total Documents Processed493061/str
 str name=Total Documents Skipped0/str
 str name=Delta Dump started2008-09-18 10:01:01/str
 str name=Identifying Delta2008-09-18 10:01:01/str
 str name=Deltas Obtained2008-09-18 10:01:43/str
 str name=Building documents2008-09-18 10:01:43/str
 str name=Total Changed Documents1587889/str
 /lst

 And do you think my cron job can't work ?


 Your wget command looks fine to me.
 
 


 --
 View this message in context:
 http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p19555223.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/delta-import-looks-stuck--how-can-I-check-if-it%27s-done-or-not---tp19551728p1948.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Hardware config for SOLR

2008-09-18 Thread Matthew Runo
I can't speak to a lot of this - but regarding the servers I'd go with  
the more powerful ones, if only for the amount of ram. Your index will  
likely be larger than 1 gig, and with only two you'll have a lot of  
your index not stored in ram, which will slow down your QPS.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Sep 17, 2008, at 3:32 PM, Andrey Shulinskiy wrote:


Hello,



We're planning to use SOLR for our project, got some questions.



So I asked some Qs yesterday, got no answers whatsoever. Wondering if
they didn't make sense, or if the e-mail was too long... :-)

Anyway, I'll try to ask them again and hope for some answers this  
time.


It's a very new experience for us so any help is really appreciated.



First, some numbers we're expecting.

- The average size of a doc: ~100K

- The number of indexes: 1

- The query response time we're looking for:  200 - 300ms

- The number of stored docs:

1st year: 500K - 1M

2nd year: 2-3M

- The estimated number of concurrent users per second

1st year: 15 - 25

2nd year: 40 - 60

- The estimated number of queries

1st year: 15 - 25

2nd year: 40 - 60



Now the questions



1)  Should we do sharding or not?

If we start without sharding, how hard will it be to enable it?

Is it just some config changes + the index rebuild or is it more?

My personal opinion is to go without sharding at first and enable it
later if do get a lot of documents.



2)  How should we organize our clusters to ensure redundancy?

Should we have 2 or more identical Masters (means that all the
updates/optimisations/etc. are done for every one of them)?

An alternative, afaik, is to reconfigure one slave to become the new
Master, how hard is that?



3) Basically, we can get servers of two kinds:



* Single Processor, Dual Core Opteron 2214HE

* 2 GB DDR2 SDRAM

* 1 x 250 GB (7200 RPM) SATA Drive(s)



* Dual Processor, Quad Core 5335

* 16 GB Memory (Fully Buffered)

* 2 x 73 GB (10k RPM) 2.5 SAS Drive(s), RAID 1



The second - more powerful - one is more expensive, of course.



How can we take advantage of the multiprocessor/multicore servers?

Is there some special setup required to make, say, 2 instances of SOLR
run on the same server using different processors/cores?



4)  Does it make much difference to get a more powerful Master?

Or, on the contrary, as slaves will be queried more often, they should
be the better ones? Maybe just the HDDs for the slaves should be as  
fast

as possible?



5) How many slaves does it make sense to have per one Master?

What's (roughly) the performance gain from 1 to 2, 2 - 3, etc?

When does it stop making sense to add more slaves?

As far as I understand, it depends mainly on the size of the index.
However, I'd guess the time required to do a push for too many slaves
can be a problem too, correct?



Thanks,

Andrey.







RE: Solr vs Autonomy

2008-09-18 Thread Kashyap, Raghu
Hi Geoff,

I cannot vouch for Autonomy however, earlier this year we did evaluate
Endeca  Solr and we went with Solr some of the reasons were:

1. Freedom of open source with Solr
2. Very good  active solr open source community
3. Features pretty much overlap with both solr  Endeca
4. Endeca however provides a very rich Business Tool that some people
might like
5. Our development is comfortable working with open source 
6. Not good support from Endeca on Internationalization

Hope this helps in some ways

-Raghu

-Original Message-
From: Geoff Hopson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 18, 2008 12:47 AM
To: solr-user@lucene.apache.org
Subject: Solr vs Autonomy

Hi,

I'm under pressure to justify the use of Solr on my project, and
others are suggesting that Autonomy be used instead. Apart from price,
does anyone have a list of pros/cons around Autonomy compared to Solr?

Thanks
geoff


Re: Unable to filter fq param on a dynamic field

2008-09-18 Thread Otis Gospodnetic
Barry, does this return the correct hits:

http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=Output-Type-facet:Monochrome

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Barry Harding [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 7:21:49 AM
 Subject: Unable to filter fq param on a dynamic field
 
 
 
 Hi,
 
 I have a fairly simple solr setup with several predefined fields that are 
 indexed and stored and also depending on the type of product I also add 
 various 
 dynamic fields of type string to a record, and I should mention that I am 
 using 
 the
 solr.DisMaxRequestHandler request handler called /IvolutionSearch in my 
 example requests.
 
 
 
 My Schema is as follows:
 
 
 required=true /
 
 
 required=true /
 
 
 required=true /
 
 
 required=true /
 
 
 
 
 required=true /
 
 
 /
 
 
 required=true /
 
 
 /
 
 
 
 
 required=true /
 
 
 required=false /
 
 
 required=false /
 
 
 required=true /
 
 
 required=false /
 
 
 required=true /
 
 
 
 
 multiValued=false /
 
 
 
 Now I can query for any of the fixed field types Such as ProductName or 
 ReviewRating and get the results I expect but when I try to run a filter 
 query 
 on the dynamic fields in the result, I always end up with no results being 
 returned.
 
 
 
 So if I run the following query against my copy of solr 1.3 I get the results 
 I 
 am expecting
 
 
 
 http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100
 
 
 
 - 
 - 
   $A
   Mono Laser Printers
   Printers|Mono Laser Printers
   Wired
   UK
   UK$AQ969719
   3500V_DN
   Xerox
   Xerox
   Monochrome
   The Xerox Phaser 3500 series printer provides an 
 affordable solution to meet the increasing volume a
   464.10
   Q969719
   XEROX 3500DN MONO LASER
   
   E000
   2
   Laser
   26099.jpg
   Workgroup printer
   MLASERPRN
   2008-09-17T17:10:44.37Z
   
 - 
   $B
   Mono Laser Printers
   Printers|Mono Laser Printers
   Wired
 
 and so on for the 100 results
 
 no if I try to filter those results to just those that contain 
 Output-Type-facet equaling Monochrome
 
 using :
 http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome
 or
 
 http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome;
 or
 http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome;
 or
 http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laserrows=100fq=Output-Type-facet:Monochrome;
 
 
 
 I just get zero results back even though I know that filed contains that 
 value, 
 please before I pull my hair out tell me what mistake I have made, why can I 
 query using a static field and not a dynamic field
 
 any help even if its to say I have been stupid or to tell me to reread a 
 section 
 of the manual/Wiki because I did not get the point much appreciated.
 
 
 
 Thanks
 
 Barry H
 
 
 
 
 
 
 
 
 
 
 Misco is a division of Systemax Europe Ltd.  Registered in Scotland Number 
 114143.  Registered Office: Caledonian Exchange, 19a Canning Street, 
 Edinburgh 
 EH3 8EG.  Telephone +44 (0)1933 686000.



Re: AW: Date field mystery

2008-09-18 Thread Otis Gospodnetic
Hi Christian,

While I can't tell you whether the problem with - will be solved when you try 
it on 1.3, I can tell you that you should probably trim your dates so they are 
not as fine as you currently have them, unless you need such precision.  We 
need to add this to the FAQ. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Kolodziej Christian [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 3:56:55 AM
 Subject: AW: Date field mystery
 
 Hi Chris,
 
 it was a long night for our solr server today because we rebuilt the complete 
 index using well formed date string. And the date field is stored now so 
 that 
 we can see if there went something wrong :-)
 
 But our problems are solved completely. Now I can give you a very exact 
 description what is the problem now (and what was the reason that we used 
 malformed date values).
 
 Let's imagine we have 3 records with die following date values:
 1. 2006-03-04T12:23:19Z
 2. 2007-08-12T19:07:03Z
 3. 2008-09-16T12:56:19Z
 
 And now I will give you some queries and which results we get back:
 - date:[2005-01-01T00:00:00Z TO NOW] or date:[2005-01-01T00:00:00Z TO 
 2008-09-18T09:45:00Z]: 1 and 2 (incorrect)
 - date:[2005-01-01T00:00:00Z TO 20080918T09:45:00Z]: 1, 2, 3 (correct)
 - date:[2005-01-01T00:00:00Z TO 2007-12-31T23:59:59Z]: only 1 (incorrect)
 - date:[2005-01-01T00:00:00Z TO 20071231T23:59:59Z]: 1 and 2 (correct)
 
 So as you can see using - in the second parameter of the range query for 
 the 
 date field causes an error and doesn't find the record should has to be 
 found, 
 using a malformed date value without - return the correct records.
 
 When using - for the second parameter all records that are from the year 
 contained in the parameter aren't found any more. This behavior is 
 reproducible 
 on different systems, either CentOS or Debian. It must be a problem of solr 
 or 
 the Lucene (query parser) itself.
 
 Our next steps are to test our scenario using solr 1.3 and if the problem 
 isn't 
 fix we will using timestamps instead for the date format. But maybe this is a 
 general problem of solr and should be fixed because in other cases and for 
 other 
 users it's not possible to make a workaround and they get wrong (incomplete) 
 results for their query.
 
 Best regards,
 Christian



Re: Field level security

2008-09-18 Thread Otis Gospodnetic
Hi,

If all you have to do is hide certain fields from search results for some 
users, then your application -- the application that sends search requests to 
Solr  can just use different fl=XXX parameters based on user's permission.  I 
think that's all you need and the custom fieldType should not be needed.

As for entering just the keywords and searching several fields automatically - 
this is what DisMax handler is good at, so give that a try.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Geoff Hopson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 3:21:01 AM
 Subject: Re: Field level security
 
 Hi Otis,
 Thanks for the response. I'll try and inline some clarity...
 
 2008/9/18 Otis Gospodnetic :
 
  I am trying to put together a security model around fields in my
  index. My requirement is that a user may not have permission to view
  certain fields in the index when he does a search. For example, he may
  have permission to see the name and address, but not the occupation.
  Whereas a different user with different permissions will be able to
  search all 3 fields.
 
  What exactly is restricted?  Viewing of specific fields in results, or 
 searching in specific fields?
 
 I am restricting the results - the user can search everything, but I
 was planning (as you mention) to apply a fieldList qualifier to the
 query. In my head (ie not tried it yet) I was hoping I could write a
 'SecurityRequestHandler' that would take an incoming security 'token'
 and construct a %fl qualifier.
 
 Some other thoughts in my head are around developing my own fieldType,
 where I could tokenise the value against the field (e.g. store 
 name=occupationcandlestick maker=Restricted or something
 similar. Thoughts on that?
 
 
  If it's the former, you could tell Solr which fields to return using 
 %fl=field1,field2...
  If it's the latter, you could always write a custom SearchComponent that 
  takes 
 your custom userType or allowedFields parameter and constructs a query 
 based 
 on that.
 
  What is the best way to model this?
 
  My current stab at this has a document-level security level set (I
  have a field called security_default), and all fields have this
  default. If there are exceptions, I have a multiValued field called
  'security_exceptions' where I comma delimit the fild name and
  different access permission for that field. Eg I might have
  'occupation=Restricted' in that field.
 
  This falls over when I copyField fields into a text field for easier 
 searching.
 
  Searching across multiple fields is pretty easy, too.  I'd stick to that, 
  as 
 that also lets you assign different weight to different fields.
 
 
 My requirement is to offer a google-type search, so the user can type
 in john smith ford green and get results where ford may be a last
 name or a car manufacturer, or green is the colour of the car, a
 last name or part of a town name. If I tokenised the field values as
 above and copyField-ed them into a single text box, would my tokeniser
 pick those out?
 
 Dunno - I guess I need to roll my sleeves up and do some coding, try
 some of this out.
 
 Thanks again for any insights
 
 Geoff



Re: Solr vs Autonomy

2008-09-18 Thread Otis Gospodnetic
Geoff,

In short: all items that you listed are not a problem for Solr.  Indices can be 
sharded, distributed search is possible, custom ranking is possible, 30 fields 
is possible, etc. etc.


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Geoff Hopson [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 10:43:58 AM
 Subject: Re: Solr vs Autonomy
 
 My project is looking to index 10s of millions of documents, providing
 search across a live-live environment (hence index
 distribution/replication is important). Most searches have to be done
 (ie to end user) in 5 seconds or less. The index has about 30 fields,
 and I reckon that the security access I alluded to can be solved with
 field-specific queries (as opposed to a single copyFielded text
 field).
 
 The searches are very simple, but need to be quick. The confidence
 in the information is important, and so scoring is value. Faceted
 searches have a place too.
 
 Autonomy seems to have a solid security/access control model but
 offers nothing above and beyond Solr, unless I am missing something.
 
 Dunno if that helps?
 
 Geoff
 
 2008/9/18 Walter Underwood :
  It depends entirely on the needs of the project. For some things,
  Solr is superior to Autonomy, for other things, not.
 
  I used to work at Autonomy (and Verity and Inktomi and Infoseek),
  and I chose Solr for Netflix. It is working great for us.
 
  wunder
  ==
  Walter Underwood
  Former Ultraseek Architect
  Current Netflix Search Lead
 
  On 9/17/08 10:46 PM, Geoff Hopson wrote:
 
  Hi,
 
  I'm under pressure to justify the use of Solr on my project, and
  others are suggesting that Autonomy be used instead. Apart from price,
  does anyone have a list of pros/cons around Autonomy compared to Solr?
 
  Thanks
  geoff
 
 
 
 
 
 -- 
 Light travels faster than sound. This is why some people appear bright
 until you hear them speak………
 Mario Kart Wii: 2320 6406 5974



snapshot.yyyymmdd ... can't found them?

2008-09-18 Thread sunnyfr

Hi 
sorry I think I've started properly rsyncd :

[EMAIL PROTECTED]:/# ./data/solr/books/bin/rsyncd-enable

[EMAIL PROTECTED]:/# ./data/books/video/bin/rsyncd-start 

but then I can't found this snapshot.current files ?? 
How can I check I did it properly ? 

my rsyncd.log :
2008/09/18 18:06:04 enabled by root
2008/09/18 18:06:04 command: /data/solr/books/bin/rsyncd-enable
2008/09/18 18:06:04 rsyncd already currently enabled
2008/09/18 18:06:04 exited (elapsed time: 0 sec)
2008/09/18 18:06:46 enabled by root
2008/09/18 18:06:46 command: ./data/solr/books/bin/rsyncd-enable
2008/09/18 18:06:46 rsyncd already currently enabled
2008/09/18 18:06:46 exited (elapsed time: 0 sec)
2008/09/18 18:07:17 started by root
2008/09/18 18:07:17 command: ./data/solr/books/bin/rsyncd-start
2008/09/18 18:07:17 [28782] connect from localhost (127.0.0.1)
2008/09/18 18:07:17 [28782] module-list request from localhost (127.0.0.1)
2008/09/18 18:07:17 rsyncd already running at port 18180

Thanks a lot,


-- 
View this message in context: 
http://www.nabble.com/snapshot.mmdd-...-can%27t-found-them--tp19556507p19556507.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Setting request method to post on SolrQuery causes ClassCastException

2008-09-18 Thread syoung

I tried setting the 'wt' parameter to both 'xml' and 'javabin'.  Neither
worked.  However, setting the parser on the server to XMLResponseParser did
fix the problem.  Thanks for the help.

Susan



Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 I guess the post is not sending the correct 'wt' parameter. try
 setting wt=javabin explicitly .
 
 wt=xml may not work because the parser still is binary.
 
 check this http://wiki.apache.org/solr/Solrj#xmlparser
 
 
 
 
 
 On Thu, Sep 18, 2008 at 11:49 AM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:
 A quick work-around is, I think, to tell Solr to use the non-binary
 response, e.g. wt=xml (I think that's the syntax).

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: syoung [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, September 17, 2008 7:27:30 PM
 Subject: Setting request method to post on SolrQuery causes
 ClassCastException


 Hi,

 I need to have queries over a certain length done as a post instead of a
 get.  However, when I set the method to post, I get a
 ClassCastException.
 Here is the code:

 public QueryResponse query(SolrQuery solrQuery) {
 QueryResponse response = null;
 try {
 if (solrQuery.toString().length()  MAX_URL_LENGTH)
 response = server.query(solrQuery, SolrRequest.METHOD.POST);
 else
 response = server.query(solrQuery, SolrRequest.METHOD.GET);
 } catch (SolrServerException e) {
 throw new DataAccessResourceFailureException(e.getMessage(), e);
 }
 return response;
 }

 And the stack trace:

 java.lang.ClassCastException: java.lang.String
 org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)
 org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113)
 com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33)

 Thanks,

 Susan


 --
 View this message in context:
 http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19557138.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Unable to filter fq param on a dynamic field

2008-09-18 Thread Barry Harding
Hi Otis,

no that does not seem to bring back the correct results either in fact its 
still zero results.

Its also not bringing back results if I use the standard handler
 http://127.0.0.1:8080/apache-solr-1.3.0/select?q=Output-Type-facet:Monochrome

but the field is visible in the documents returned if I search for the 
following:

http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser

so i know that the field is in the results generated (shown below)

 ?xml version=1.0 encoding=UTF-8 ?
-http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# response
-http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# lst 
name=responseHeader
  int name=status0/int
  int name=QTime666/int
-http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# lst 
name=params
  str name=qlaser/str
  /lst
  /lst
-http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# result 
name=response numFound=8056 start=0
-http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# doc
  str name=CampaignCode$A/str
  str name=CategoryNameMono Laser Printers/str
  str name=CategoryPathPrinters|Mono Laser Printers/str
  str name=Connectivity-Technology-facetWired/str
  str name=CountryCodeUK/str
  str name=IdUK$AQ63360/str
  str name=MPNQ7697A#ABU/str
  str name=Manufacturer-facetHewlett Packard/str
  str name=ManufacturerNameHP/str
  str name=Output-Type-facetMonochrome/str
  str name=OverviewThe LaserJet 9000 series printer is HP's fastest, most 
versatile LaserJet designed for today's distr/str
  float name=Price1388.99/float
  str name=ProductCodeQ63360/str
  str name=ProductNameHP LASERJET 9040 MONO LASER/str
  int name=ReviewRating /
  str name=StockCodeE000/str
  str name=TaxCode2/str
  str name=Technology-facetLaser/str
  str name=ThumbnailURI98404.jpg/str
  str name=Type-facetWorkgroup printer/str
  str name=WebClassificationMLASERPRN/str
  date name=timestamp2008-09-18T16:44:01.029Z/date
  /doc
-http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser# doc
  str name=CampaignCode$B/str
  str name=CategoryNameMono Laser Printers/str
  str name=CategoryPathPrinters|Mono Laser Printers/str
  str name=Connectivity-Technology-facetWired/str




Misco is a division of Systemax Europe Ltd.  Registered in Scotland Number 
114143.  Registered Office: Caledonian Exchange, 19a Canning Street, Edinburgh 
EH3 8EG.  Telephone +44 (0)1933 686000.

Re: Dismax + Dynamic fields

2008-09-18 Thread Jon Drukman

Daniel Papasian wrote:

Norberto Meijome wrote:

Thanks Yonik. ok, that matches what I've seen - if i know the actual
name of the field I'm after, I can use it in a query it, but i can't
use the dynamic_field_name_* (with wildcard) in the config.

Is adding support for this something that is desirable / needed
(doable??) , and is it being worked on ?


You can use a wildcard with copyFrom to copy the dynamic fields that
match the pattern to another field that you can then query on. It seems
like that would cover your needs, no?


this is biting me right now and i don't understand how to specify the 
copyFrom to do what i want.


i have a dynamic field declaration like:

dynamicField name=*_t type=text indexed=true stored=true/

in the documents that i'm adding i am specifying location_t and group_t, 
for example, although i may decide to add more later - obviously that 
seems like the ideal use case for the dynamicField.  however i cannot 
search these fields unless i specify them explicitly 
(q=location_t:something) and it doesn't work with dismax.


i want all fields searchable, otherwise why would i bother with 
indexed=true in the dynamicField?


how do i use copyFrom to search location_t, group_t and any other _t i 
might decide to add later?


-jsd-




Re: Unable to filter fq param on a dynamic field

2008-09-18 Thread Otis Gospodnetic
Barry,

You are seeing the value of the field as it was saved (as the original), but 
perhaps something is funky with how it was analyzed/tokenized at search time 
and how it is being analyzed now at query time.  Double-check your 
fieldType/analysis settings for this field and make sure you are using the 
same/compatible analyzers at both index and query time.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Barry Harding [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 12:53:08 PM
 Subject: RE: Unable to filter fq param on a dynamic field
 
 Hi Otis,
 
 no that does not seem to bring back the correct results either in fact its 
 still 
 zero results.
 
 Its also not bringing back results if I use the standard handler
 http://127.0.0.1:8080/apache-solr-1.3.0/select?q=Output-Type-facet:Monochrome
 
 but the field is visible in the documents returned if I search for the 
 following:
 
 http://127.0.0.1:8080/apache-solr-1.3.0/IvolutionSearch?q=laser
 
 so i know that the field is in the results generated (shown below)
 
 
 -
 -
 name=responseHeader
   0
   666
 -
 name=params
   laser
   
   
 -
 name=response numFound=8056 start=0
 -
   $A
   Mono Laser Printers
   Printers|Mono Laser Printers
   Wired
   UK
   UK$AQ63360
   Q7697A#ABU
   Hewlett Packard
   HP
   Monochrome
   The LaserJet 9000 series printer is HP's fastest, most 
 versatile LaserJet designed for today's distr
   1388.99
   Q63360
   HP LASERJET 9040 MONO LASER
   
   E000
   2
   Laser
   98404.jpg
   Workgroup printer
   MLASERPRN
   2008-09-18T16:44:01.029Z
   
 -
   $B
   Mono Laser Printers
   Printers|Mono Laser Printers
   Wired
 
 
 
 
 Misco is a division of Systemax Europe Ltd.  Registered in Scotland Number 
 114143.  Registered Office: Caledonian Exchange, 19a Canning Street, 
 Edinburgh 
 EH3 8EG.  Telephone +44 (0)1933 686000.



RE: Searching for future or null dates

2008-09-18 Thread Chris Maxwell

Here is what I was able to get working with your help.

(productId:(102685804)) AND liveDate:[* TO NOW] AND ((endDate:[NOW TO *]) OR
((*:* -endDate:[* TO *])))

the *:* is what I was missing.

Thanks for your help.



hossman wrote:
 
 
 : If the query stars with a negative clause Lucene returns nothing.
 
 that's not true.  If a Query in lucene is a BooleanQuery that only 
 contains negative clauses, then Lucene returns nothing (because nothing is 
 positively selected) ... but it if there is a mix of negative lcauses and 
 positive clauses doesn't matter what order the clauses are in.
 
 in *solr* there is code that attempts to detect a query containing purely 
 negative clauses and it adds a MatchAllDocs query in that case -- but it 
 only works at the top level of a query.  nested queries like this...
 
 +fieldA:foo +(-fieldB:bar -fieldC:baz)
 
 ...won't work as you expect, because that nested query is only negative 
 clauses.  you can add your own MatchAllDocs query explicitly using the *:* 
 syntax
 
 +fieldA:foo +(*:* -fieldB:bar -fieldC:baz)
 
 :  endDate[NOW TO *] OR -endDate:[* TO *]
 
 side note: you really, REALLY don't wnat to mix the +/- syntax with 
 ANT/OR/NOT .. it almost never works out the way you expect...
 
 : can I search for a date, which is either in the future OR missing 
 : completely (meaning open ended)
 : 
 : I've tried -endDate:[* TO *] OR endDate[NOW TO *] but that doesn't
 work.
 
 unless you've set the default op to AND this should work...
 
   fq = endDate:[NOW TO *] (*:* -endDate:[* TO *])
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Searching-for-future-or-%22null%22-dates-tp19502167p19563117.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filtering results

2008-09-18 Thread ristretto . rb
Otis,

Would be reasonable to run a query like this

http://localhost:8280/solr/select/?q=terms_xversion=2.2start=0rows=0indent=on

10 times, one for each result from an initial category query on a
different index.
So, it's still 1+10, but I'm  not returning values.
This would give me the number of pages that would match, and I can
display that number.
Not ideal, but better then nothing, and hopefully not a problem with scaling.

cheers
gene



On Wed, Sep 17, 2008 at 1:21 PM, Gene Campbell [EMAIL PROTECTED] wrote:
 OK thanks Otis.  Any gut feeling on the best approach to get this
 collapsed data?  I hate to ask you to do my homework, but I'm coming
 to the
 end of my Solr/Lucene knowledge.  I don't code java too well - used
 to, but switched to Python a while back.

 gene




 On Wed, Sep 17, 2008 at 12:47 PM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:
 Gene,

 The latest patch from Bojan for SOLR-236 works with whatever revision of 
 Solr he used when he made the patch.

 I didn't follow this thread to know your original requirements, but running 
 1+10 queries doesn't sound good to me from scalability/performance point of 
 view.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: ristretto.rb [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, September 16, 2008 6:45:02 PM
 Subject: Re: Filtering results

 thanks.  very interesting.  The plot thickens.  And, yes, I think
 field collapsing is exactly what I'm after.

 I'm am considering now trying this patch.  I have a solr 1.2 instance
 on Jetty.  I looks like I need to install the patch.
 Does anyone use that patch?  Recommend it?  The wiki page
 (http://wiki.apache.org/solr/FieldCollapsing) says
 This patch is not complete, but it will be useful to keep this page
 updated while the interface evolves.  And the page
 was last updated over a year ago, so I'm not sure if that is a good.
 I'm trying to read through all the comments now.

 .  I'm also considering creating a second index of just the
 categories which contains all the content from the main index
 collapsed
 down in to the corresponding categories - basically a complete
 collapsed index.
 Initial searches will be done against this collapsed category index,
 and then the first 10 results
 will be used to do 10 field queries against the main index to get the
 top records to return with each Category.

 Haven't decided which path to take yet.

 cheers
 gene


 On Wed, Sep 17, 2008 at 9:42 AM, Chris Hostetter
 wrote:
 
  : 1.  Identify all records that would match search terms.  (Suppose I
  : search for 'dog', and get 450,000 matches)
  : 2.  Of those records, find the distinct list of groups over all the
  : matches.  (Suppose there are 300.)
  : 3.  Now get the top ranked record from each group, as if you search
  : just for docs in the group.
 
  this sounds similar to Field Collapsing although i don't really
  understand it or your specific use case enough to be certain that it's the
  same thing.  You may find the patch, and/or the discussions about the
  patch useful starting points...
 
  https://issues.apache.org/jira/browse/SOLR-236
  http://wiki.apache.org/solr/FieldCollapsing
 
 
  -Hoss
 
 





RE: Hardware config for SOLR

2008-09-18 Thread Andrey Shulinskiy
Matthew,

Thanks, a very good point.

Andrey.

 -Original Message-
 From: Matthew Runo [mailto:[EMAIL PROTECTED]
 Sent: Thursday, September 18, 2008 11:38 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Hardware config for SOLR
 
 I can't speak to a lot of this - but regarding the servers I'd go with
 the more powerful ones, if only for the amount of ram. Your index will
 likely be larger than 1 gig, and with only two you'll have a lot of
 your index not stored in ram, which will slow down your QPS.
 
 Thanks for your time!
 
 Matthew Runo
 Software Engineer, Zappos.com
 [EMAIL PROTECTED] - 702-943-7833
 
 On Sep 17, 2008, at 3:32 PM, Andrey Shulinskiy wrote:
 
  Hello,
 
 
 
  We're planning to use SOLR for our project, got some questions.
 
 
 
  So I asked some Qs yesterday, got no answers whatsoever. Wondering
if
  they didn't make sense, or if the e-mail was too long... :-)
 
  Anyway, I'll try to ask them again and hope for some answers this
  time.
 
  It's a very new experience for us so any help is really appreciated.
 
 
 
  First, some numbers we're expecting.
 
  - The average size of a doc: ~100K
 
  - The number of indexes: 1
 
  - The query response time we're looking for:  200 - 300ms
 
  - The number of stored docs:
 
  1st year: 500K - 1M
 
  2nd year: 2-3M
 
  - The estimated number of concurrent users per second
 
  1st year: 15 - 25
 
  2nd year: 40 - 60
 
  - The estimated number of queries
 
  1st year: 15 - 25
 
  2nd year: 40 - 60
 
 
 
  Now the questions
 
 
 
  1)  Should we do sharding or not?
 
  If we start without sharding, how hard will it be to enable it?
 
  Is it just some config changes + the index rebuild or is it more?
 
  My personal opinion is to go without sharding at first and enable it
  later if do get a lot of documents.
 
 
 
  2)  How should we organize our clusters to ensure redundancy?
 
  Should we have 2 or more identical Masters (means that all the
  updates/optimisations/etc. are done for every one of them)?
 
  An alternative, afaik, is to reconfigure one slave to become the new
  Master, how hard is that?
 
 
 
  3) Basically, we can get servers of two kinds:
 
 
 
  * Single Processor, Dual Core Opteron 2214HE
 
  * 2 GB DDR2 SDRAM
 
  * 1 x 250 GB (7200 RPM) SATA Drive(s)
 
 
 
  * Dual Processor, Quad Core 5335
 
  * 16 GB Memory (Fully Buffered)
 
  * 2 x 73 GB (10k RPM) 2.5 SAS Drive(s), RAID 1
 
 
 
  The second - more powerful - one is more expensive, of course.
 
 
 
  How can we take advantage of the multiprocessor/multicore servers?
 
  Is there some special setup required to make, say, 2 instances of
SOLR
  run on the same server using different processors/cores?
 
 
 
  4)  Does it make much difference to get a more powerful Master?
 
  Or, on the contrary, as slaves will be queried more often, they
should
  be the better ones? Maybe just the HDDs for the slaves should be as
  fast
  as possible?
 
 
 
  5) How many slaves does it make sense to have per one Master?
 
  What's (roughly) the performance gain from 1 to 2, 2 - 3, etc?
 
  When does it stop making sense to add more slaves?
 
  As far as I understand, it depends mainly on the size of the index.
  However, I'd guess the time required to do a push for too many
slaves
  can be a problem too, correct?
 
 
 
  Thanks,
 
  Andrey.
 
 
 
 



firstSearcher and newSearcher events

2008-09-18 Thread oleg_gnatovskiy

Hello. I am using the spellcheck component
(https://issues.apache.org/jira/browse/SOLR-572). Since the spell checker
index is kept in RAM, it gets erased every time the Solr server gets
restarted. I was thinking of using either the firstSearcher or the
newSearcher to reload the index every time Solr starts. The events are
defined as so: 

listener event=newSearcher class=solr.QuerySenderListener
arr name=queries
lst
str name=spellchecktrue/str
str name=spellcheck.dictionaryexternal/str
str name=spellcheck.buildtrue/str
str name=qpiza/str
/lst
/arr
/listener

listener event=firstSearcher class=solr.QuerySenderListener
−
arr name=queries
−
lst
str name=qfast_warm/str
str name=start0/str
str name=rows10/str
/lst

lst
str name=q
static firstSearcher warming query from solrconfig.xml
/str
/lst
lst
str name=spellchecktrue/str
str name=spellcheck.dictionaryexternal/str
str name=spellcheck.buildtrue/str
str name=qpiza/str
/lst
/arr
/listener

However the index does not load. When I check the logs I noticed the
following:
when the event runs the log looks like this:

INFO: [] webapp=null path=null
params={spellcheck=trueq=pizaspellcheck.dictionary=externalspellcheck.build=true}
hits=0 status=0 QTime=1

a regular request looks like this:

INFO: [] webapp=/solr path=/select/
params={spellcheck=trueq=pizaspellcheck.dictionary=externalspellcheck.build=true}
hits=0 status=0 QTime=19459

I am guessing that the reason it doesn't work with the autowarm is that the
webapp is null. Does anyone have any ideas what I can do to load that index
in advance?
-- 
View this message in context: 
http://www.nabble.com/firstSearcher-and-newSearcher-events-tp19564163p19564163.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: firstSearcher and newSearcher events

2008-09-18 Thread Shalin Shekhar Mangar
On Fri, Sep 19, 2008 at 5:55 AM, oleg_gnatovskiy 
[EMAIL PROTECTED] wrote:


 Hello. I am using the spellcheck component
 (https://issues.apache.org/jira/browse/SOLR-572). Since the spell checker
 index is kept in RAM, it gets erased every time the Solr server gets
 restarted. I was thinking of using either the firstSearcher or the
 newSearcher to reload the index every time Solr starts.


This capability is already in SpellCheckComponent:

http://wiki.apache.org/solr/SpellCheckComponent#onCommit



 --
 View this message in context:
 http://www.nabble.com/firstSearcher-and-newSearcher-events-tp19564163p19564163.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Filtering results

2008-09-18 Thread Otis Gospodnetic
Gene,
I haven't looked at Field Collapsing for a while, but if you have a single 
index and collapse hits on your category field, then won't first 10 hits be 
items you are looking for - top 1 item for each category x 10 using a single 
query.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: ristretto.rb [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 7:35:43 PM
 Subject: Re: Filtering results
 
 Otis,
 
 Would be reasonable to run a query like this
 
 http://localhost:8280/solr/select/?q=terms_xversion=2.2start=0rows=0indent=on
 
 10 times, one for each result from an initial category query on a
 different index.
 So, it's still 1+10, but I'm  not returning values.
 This would give me the number of pages that would match, and I can
 display that number.
 Not ideal, but better then nothing, and hopefully not a problem with scaling.
 
 cheers
 gene
 
 
 
 On Wed, Sep 17, 2008 at 1:21 PM, Gene Campbell wrote:
  OK thanks Otis.  Any gut feeling on the best approach to get this
  collapsed data?  I hate to ask you to do my homework, but I'm coming
  to the
  end of my Solr/Lucene knowledge.  I don't code java too well - used
  to, but switched to Python a while back.
 
  gene
 
 
 
 
  On Wed, Sep 17, 2008 at 12:47 PM, Otis Gospodnetic
  wrote:
  Gene,
 
  The latest patch from Bojan for SOLR-236 works with whatever revision of 
  Solr 
 he used when he made the patch.
 
  I didn't follow this thread to know your original requirements, but 
  running 
 1+10 queries doesn't sound good to me from scalability/performance point of 
 view.
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: ristretto.rb 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, September 16, 2008 6:45:02 PM
  Subject: Re: Filtering results
 
  thanks.  very interesting.  The plot thickens.  And, yes, I think
  field collapsing is exactly what I'm after.
 
  I'm am considering now trying this patch.  I have a solr 1.2 instance
  on Jetty.  I looks like I need to install the patch.
  Does anyone use that patch?  Recommend it?  The wiki page
  (http://wiki.apache.org/solr/FieldCollapsing) says
  This patch is not complete, but it will be useful to keep this page
  updated while the interface evolves.  And the page
  was last updated over a year ago, so I'm not sure if that is a good.
  I'm trying to read through all the comments now.
 
  .  I'm also considering creating a second index of just the
  categories which contains all the content from the main index
  collapsed
  down in to the corresponding categories - basically a complete
  collapsed index.
  Initial searches will be done against this collapsed category index,
  and then the first 10 results
  will be used to do 10 field queries against the main index to get the
  top records to return with each Category.
 
  Haven't decided which path to take yet.
 
  cheers
  gene
 
 
  On Wed, Sep 17, 2008 at 9:42 AM, Chris Hostetter
  wrote:
  
   : 1.  Identify all records that would match search terms.  (Suppose I
   : search for 'dog', and get 450,000 matches)
   : 2.  Of those records, find the distinct list of groups over all the
   : matches.  (Suppose there are 300.)
   : 3.  Now get the top ranked record from each group, as if you search
   : just for docs in the group.
  
   this sounds similar to Field Collapsing although i don't really
   understand it or your specific use case enough to be certain that it's 
   the
   same thing.  You may find the patch, and/or the discussions about the
   patch useful starting points...
  
   https://issues.apache.org/jira/browse/SOLR-236
   http://wiki.apache.org/solr/FieldCollapsing
  
  
   -Hoss
  
  
 
 
 



error when post xml data to solr

2008-09-18 Thread 李学健
hi, all

when i post an xml file to solr, some errors happen as below:
==
com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
at [row,col {unknown-source}]: [1,0]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
at
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:148)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
) that prevented it from fulfilling this request.

==

i am confused: wstx jar is used for web-service, why does solr use it ?

can anyone help me ? thanks.


Re: Filtering results

2008-09-18 Thread ristretto . rb
Thanks Otis for reply!  Always appreciated!

That is indeed what we are looking for implementing.  But, I'm running
out of time to prototype or experiment for this release.
I'm going to run the two index thing for now, unless I find something
saying is really easy and sensible to run one and collapse
on a field.

thanks
gene


On Fri, Sep 19, 2008 at 3:24 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Gene,
 I haven't looked at Field Collapsing for a while, but if you have a single 
 index and collapse hits on your category field, then won't first 10 hits be 
 items you are looking for - top 1 item for each category x 10 using a single 
 query.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: ristretto.rb [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, September 18, 2008 7:35:43 PM
 Subject: Re: Filtering results

 Otis,

 Would be reasonable to run a query like this

 http://localhost:8280/solr/select/?q=terms_xversion=2.2start=0rows=0indent=on

 10 times, one for each result from an initial category query on a
 different index.
 So, it's still 1+10, but I'm  not returning values.
 This would give me the number of pages that would match, and I can
 display that number.
 Not ideal, but better then nothing, and hopefully not a problem with scaling.

 cheers
 gene



 On Wed, Sep 17, 2008 at 1:21 PM, Gene Campbell wrote:
  OK thanks Otis.  Any gut feeling on the best approach to get this
  collapsed data?  I hate to ask you to do my homework, but I'm coming
  to the
  end of my Solr/Lucene knowledge.  I don't code java too well - used
  to, but switched to Python a while back.
 
  gene
 
 
 
 
  On Wed, Sep 17, 2008 at 12:47 PM, Otis Gospodnetic
  wrote:
  Gene,
 
  The latest patch from Bojan for SOLR-236 works with whatever revision of 
  Solr
 he used when he made the patch.
 
  I didn't follow this thread to know your original requirements, but 
  running
 1+10 queries doesn't sound good to me from scalability/performance point of
 view.
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: ristretto.rb
  To: solr-user@lucene.apache.org
  Sent: Tuesday, September 16, 2008 6:45:02 PM
  Subject: Re: Filtering results
 
  thanks.  very interesting.  The plot thickens.  And, yes, I think
  field collapsing is exactly what I'm after.
 
  I'm am considering now trying this patch.  I have a solr 1.2 instance
  on Jetty.  I looks like I need to install the patch.
  Does anyone use that patch?  Recommend it?  The wiki page
  (http://wiki.apache.org/solr/FieldCollapsing) says
  This patch is not complete, but it will be useful to keep this page
  updated while the interface evolves.  And the page
  was last updated over a year ago, so I'm not sure if that is a good.
  I'm trying to read through all the comments now.
 
  .  I'm also considering creating a second index of just the
  categories which contains all the content from the main index
  collapsed
  down in to the corresponding categories - basically a complete
  collapsed index.
  Initial searches will be done against this collapsed category index,
  and then the first 10 results
  will be used to do 10 field queries against the main index to get the
  top records to return with each Category.
 
  Haven't decided which path to take yet.
 
  cheers
  gene
 
 
  On Wed, Sep 17, 2008 at 9:42 AM, Chris Hostetter
  wrote:
  
   : 1.  Identify all records that would match search terms.  (Suppose I
   : search for 'dog', and get 450,000 matches)
   : 2.  Of those records, find the distinct list of groups over all the
   : matches.  (Suppose there are 300.)
   : 3.  Now get the top ranked record from each group, as if you search
   : just for docs in the group.
  
   this sounds similar to Field Collapsing although i don't really
   understand it or your specific use case enough to be certain that it's 
   the
   same thing.  You may find the patch, and/or the discussions about the
   patch useful starting points...
  
   https://issues.apache.org/jira/browse/SOLR-236
   http://wiki.apache.org/solr/FieldCollapsing
  
  
   -Hoss
  
  
 
 
 




Can I add custom fields to the input XML file?

2008-09-18 Thread convoyer

Hi guys.
Is the XML format for inputting data, is a standard one? or can I change it.
That is instead of :
adddoc
  field name=id3007WFP/field
  field name=nameDell Widescreen UltraSharp 3007WFP/field
  field name=manuDell, Inc./field
/doc/add

can I enter something like, 
custListclients
field name=id100100/field
field name=propertyBPO/field
field name=emp_count1500/field
/clients
clients
field name=id100200/field
field name=propertyITES/field
field name=emp_count2500/field
/clients
/custList 

Thanks




-- 
View this message in context: 
http://www.nabble.com/Can-I-add-custom-fields-to-the-input-XML-file--tp19566431p19566431.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Setting request method to post on SolrQuery causes ClassCastException

2008-09-18 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is surprising as to why this happens

the the javabin offers significant perf improvements over the xml one.

probably you can also try this
requestHandler name=/search
class=org.apache.solr.handler.component.SearchHandler
  lst name=defaults
str name=wtjavabin/str
   /lst
/requestHandler

On Thu, Sep 18, 2008 at 10:17 PM, syoung [EMAIL PROTECTED] wrote:

 I tried setting the 'wt' parameter to both 'xml' and 'javabin'.  Neither
 worked.  However, setting the parser on the server to XMLResponseParser did
 fix the problem.  Thanks for the help.

 Susan



 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 I guess the post is not sending the correct 'wt' parameter. try
 setting wt=javabin explicitly .

 wt=xml may not work because the parser still is binary.

 check this http://wiki.apache.org/solr/Solrj#xmlparser





 On Thu, Sep 18, 2008 at 11:49 AM, Otis Gospodnetic
 [EMAIL PROTECTED] wrote:
 A quick work-around is, I think, to tell Solr to use the non-binary
 response, e.g. wt=xml (I think that's the syntax).

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: syoung [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, September 17, 2008 7:27:30 PM
 Subject: Setting request method to post on SolrQuery causes
 ClassCastException


 Hi,

 I need to have queries over a certain length done as a post instead of a
 get.  However, when I set the method to post, I get a
 ClassCastException.
 Here is the code:

 public QueryResponse query(SolrQuery solrQuery) {
 QueryResponse response = null;
 try {
 if (solrQuery.toString().length()  MAX_URL_LENGTH)
 response = server.query(solrQuery, SolrRequest.METHOD.POST);
 else
 response = server.query(solrQuery, SolrRequest.METHOD.GET);
 } catch (SolrServerException e) {
 throw new DataAccessResourceFailureException(e.getMessage(), e);
 }
 return response;
 }

 And the stack trace:

 java.lang.ClassCastException: java.lang.String
 org.apache.solr.common.util.NamedListCodec.unmarshal(NamedListCodec.java:89)
 org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:385)
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:113)
 com.localmatters.guidespot.util.SolrTemplate.query(SolrTemplate.java:33)

 Thanks,

 Susan


 --
 View this message in context:
 http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19543232.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul



 --
 View this message in context: 
 http://www.nabble.com/Setting-request-method-to-post-on-SolrQuery-causes-ClassCastException-tp19543232p19557138.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul