Re: Autocomplete(terms) middle of words

2011-05-02 Thread ramires
I've  already use nutch trunk 4.0. I have problem with space. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-terms-middle-of-words-tp2878694p2888940.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fq parameter with partial value

2011-05-02 Thread elisabeth benoit
I'm a bit confused here.

What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just do
a copyField from what field to another? And how can I search only for
Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have something
like
field name=CATEGORY_TOKENIZEDHotel/field, if I want this to work. And
from what I understand, this means I should do more then just copy
field name=*CATEGORY*Restaurant Hotel/field
to CATEGORY_TOKENIZED.

Thanks,
Elisabeth


2011/4/28 Erick Erickson erickerick...@gmail.com

 See below:


 On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  yes, the multivalued field is not broken up into tokens.
 
  so, if I understand well what you mean, I could have
 
  a field CATEGORY with  multiValued=true
  a field CATEGORY_TOKENIZED with  multiValued= true
 
  and then some POI
 
  field name=NAMEPOI_Name/field
  ...
  field name=*CATEGORY*Restaurant Hotel/field
  field name=CATEGORY_TOKENIZEDRestaurant/field
  field name=CATEGORY_TOKENIZEDHotel/field

 [EOE] If the above is the document you're sending, then no. The
 document would be indexed with
 field name=*CATEGORY*Restaurant Hotel/field
 field name=CATEGORY_TOKENIZEDRestaurant Hotel/field


 Or even just:
 field name=*CATEGORY*Restaurant Hotel/field

 and set up a copyField to copy the value from CATEGORY to
 CATEGORY_TOKENIZED.

 The multiValued part comes from:
 And a single POIs might have different categories so your document could
 have
 which would look like:
 field name=CATEGORYRestaruant Hotel/field
 field name=CATEGORYHealth Spa/field
 field name=CATEGORYDance Hall/field

 and your document would be counted for each of those entries while searches
 against CATEGORY_TOKENIZED would match things like dance spa etc.

 But do notice that if you did NOT want searching for restaurant hall
 (no quotes),
 to match then you could do proximity searches for less than your
 increment gap. e.g.
 (this time with the quotes) would be restaurant hall~50, which would then
 NOT match if your increment gap were 100.

 Best
 Erick


 
  do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.
 
  But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?
 
  Best regards
  Elisabeth
 
 
  2011/4/28 Erick Erickson erickerick...@gmail.com
 
  So, I assume your CATEGORY field is multiValued but each value is not
  broken up into tokens, right? If that's the case, would it work to have
 a
  second field CATEGORY_TOKENIZED and run your fq against that
  field instead?
 
  You could have this be a multiValued field with an increment gap if you
  wanted
  to prevent matches across separate entries and have your fq do a
 proximity
  search where the proximity was less than the increment gap
 
  Best
  Erick
 
  On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
   Hi Stefan,
  
   Thanks for answering.
  
   In more details, my problem is the following. I'm working on searching
   points of interest (POIs), which can be hotels, restaurants, plumbers,
   psychologists, etc.
  
   Those POIs can be identified among other things  by categories or by
  brand.
   And a single POIs might have different categories (no maximum number).
  User
   might enter a query like
  
  
   McDonald’s Paris
  
  
   or
  
  
   Restaurant Paris
  
  
   or
  
  
   many other possible queries
  
  
   First I want to do a facet search on brand and categories, to find out
  which
   case is the current case.
  
  
   http://localhost:8080/solr /select?q=restaurant  paris
   facet=truefacet.field=BRAND facet.field=CATEGORY
  
   and get an answer like
  
   lst name=facet_fields
  
   lst name=CATEGORY
  
   int name=Restaurant598/int
  
   int name=Restaurant Hotel451/int
  
  
  
   Then I want to send a request with fq= CATEGORY: Restaurant and still
 get
   answers with CATEGORY= Restaurant Hotel.
  
  
  
   One solution would be to modify the data to add a new document every
 time
  we
   have a new category, so a POI with three different categories would be
  index
   three times, each time with a different category.
  
  
   But I was wondering if there was another way around.
  
  
  
   Thanks again,
  
   Elisabeth
  
  
   2011/4/28 Stefan Matheis matheis.ste...@googlemail.com
  
   Hi Elisabeth,
  
   that's not what FilterQueries are made for :) What against using that
   Criteria in the Query?
   Perhaps you want to describe your UseCase and we'll see if there's
   another way to solve it?
  
   Regards
   Stefan
  
   On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
   elisaelisael...@gmail.com wrote:
Hello,
   
I would like to know if there is a way to use the fq parameter with
 a
partial value.
   
For instance, if I have a request with fq=NAME:Joe, and I would
 like
  to
retrieve all answers where NAME contains Joe, including those with
  NAME =
Joe Smith.
   
Thanks,
Elisabeth
   
  
  
 
 



Re: Problems of deleting documents from Solr

2011-05-02 Thread Ahmet Arslan
Hi Jeff,

You can  delete either by unique id or by a query.
It seems that you want to delete all documents having category of monitor.

deletequerycat:monitor/query/delete

http://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.22_by_ID_and_by_Query


- Original Message -
From: Jeff Zhang zjf...@gmail.com
To: solr-user@lucene.apache.org
Cc: 
Sent: Monday, May 2, 2011 5:04 AM
Subject: Problems of deleting documents from Solr

Hi all,

I want to update some document, so first I delete these document by invoking
command java -Ddata=args -Dcommit=yes -jar post.jar
deletecatmonitor/cat/delete
The result is that I can not search the deleted documents but I can still
see the terms of these document in
http://localhost:8983/solr/admin/schema.jsp
Even when I restart solr, it's still there, I notice that the *numDocs: * 0
while *maxDoc: * 1

Why's that ? How can I delete the documents correctly ?

-- 
Best Regards

Jeff Zhang



Re: solr sorting problem

2011-05-02 Thread Pratik
Hi, 
Thanks for your reply. 

I'm, using commit=true while indexing, and it does index the records and
show the number of records indexed. 
The problem is that search yields 0 records ( numFound=0 ). 

e.g. 
responselst name=responseHeaderint name=status0/intint
name=QTime0/intlst name=paramsstr name=indenton/strstr
name=qappl/str/lst/lstresult name=response numFound=0
start=0//response 

There are some entries for spell checking in my schema too. 
e.g. 
field name=f_spell_en type=textSpell_en/
copyField source=foodDescUS dest=f_spell_en/ 

The Search URL is something like:- 
http://localhost:8983/solr/select/?q=appleindent=on 
http://localhost:8983/solr/select/?q=appleversion=2.2start=0rows=10indent=on

Cache could not be a problem as it did not fetch any records from the very
begining. 

So, basically it does not fetch any documents/records whereas it does index
them.

Thanks
Pratik 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889075.html
Sent from the Solr - User mailing list archive at Nabble.com.


OutOfMemoryError with DIH loading 140.000 documents

2011-05-02 Thread Zoltán Altfatter
Hi,

I receive OutOfMemoryError with Solr 3.1 when loading around 140.000
document with DIH.
I have set the -Xmx to 1536M. Weird that I cannot give more heap memory.
With -Xmx2G the process doesn't start.

Do you know why can't I set 2G -Xmx to Solr 3.1?

With Solr 4.0 trunk it I don't receive OutOfMemoryError issue, although I
can set the -Xmx to 2G.

Thank you.

Cheers,
Zoltan


Bulk update via filter query

2011-05-02 Thread Rih
Is there an efficient way to update multiple documents with common values
(e.g. color = white)? An example would be to mark all white-colored items as
sold-out.

- Rih


Re: Bulk update via filter query

2011-05-02 Thread Ahmet Arslan


Is there an efficient way to update multiple documents with common values
(e.g. color = white)? An example would be to mark all white-colored items as
sold-out.

http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html 
can be an option.



Re: solr sorting problem

2011-05-02 Thread Pratik
Hi,
Were you able to sort the results using alphaOnlySort ?
If yes what changes were made to the schema and data-config  ? 
Thanks 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889473.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fq parameter with partial value

2011-05-02 Thread Erick Erickson
See more below :).

On Mon, May 2, 2011 at 2:43 AM, elisabeth benoit
elisaelisael...@gmail.com wrote:
 I'm a bit confused here.

 What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just do
 a copyField from what field to another?

[EOE] Copyfield is done with the original data, not the processed
data. So it's as though
you added both fields in the input document.

And how can I search only for
 Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have something
 like
 field name=CATEGORY_TOKENIZEDHotel/field, if I want this to work.
[EOE] that's what copyfield does for you.

 And
 from what I understand, this means I should do more then just copy
 field name=*CATEGORY*Restaurant Hotel/field
 to CATEGORY_TOKENIZED.

[EOE] Don't understand your question.

Here's what I'd suggest. Just try it. Then use the admin page to look at
your fields to see what the indexed values are. Also, try using
the admin page to run some test queries with debugging on, I always get
more out of a few experiments than I do out of documentation...

Best
Erick

 Thanks,
 Elisabeth


 2011/4/28 Erick Erickson erickerick...@gmail.com

 See below:


 On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  yes, the multivalued field is not broken up into tokens.
 
  so, if I understand well what you mean, I could have
 
  a field CATEGORY with  multiValued=true
  a field CATEGORY_TOKENIZED with  multiValued= true
 
  and then some POI
 
  field name=NAMEPOI_Name/field
  ...
  field name=*CATEGORY*Restaurant Hotel/field
  field name=CATEGORY_TOKENIZEDRestaurant/field
  field name=CATEGORY_TOKENIZEDHotel/field

 [EOE] If the above is the document you're sending, then no. The
 document would be indexed with
 field name=*CATEGORY*Restaurant Hotel/field
 field name=CATEGORY_TOKENIZEDRestaurant Hotel/field


 Or even just:
 field name=*CATEGORY*Restaurant Hotel/field

 and set up a copyField to copy the value from CATEGORY to
 CATEGORY_TOKENIZED.

 The multiValued part comes from:
 And a single POIs might have different categories so your document could
 have
 which would look like:
 field name=CATEGORYRestaruant Hotel/field
 field name=CATEGORYHealth Spa/field
 field name=CATEGORYDance Hall/field

 and your document would be counted for each of those entries while searches
 against CATEGORY_TOKENIZED would match things like dance spa etc.

 But do notice that if you did NOT want searching for restaurant hall
 (no quotes),
 to match then you could do proximity searches for less than your
 increment gap. e.g.
 (this time with the quotes) would be restaurant hall~50, which would then
 NOT match if your increment gap were 100.

 Best
 Erick


 
  do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.
 
  But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?
 
  Best regards
  Elisabeth
 
 
  2011/4/28 Erick Erickson erickerick...@gmail.com
 
  So, I assume your CATEGORY field is multiValued but each value is not
  broken up into tokens, right? If that's the case, would it work to have
 a
  second field CATEGORY_TOKENIZED and run your fq against that
  field instead?
 
  You could have this be a multiValued field with an increment gap if you
  wanted
  to prevent matches across separate entries and have your fq do a
 proximity
  search where the proximity was less than the increment gap
 
  Best
  Erick
 
  On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
   Hi Stefan,
  
   Thanks for answering.
  
   In more details, my problem is the following. I'm working on searching
   points of interest (POIs), which can be hotels, restaurants, plumbers,
   psychologists, etc.
  
   Those POIs can be identified among other things  by categories or by
  brand.
   And a single POIs might have different categories (no maximum number).
  User
   might enter a query like
  
  
   McDonald’s Paris
  
  
   or
  
  
   Restaurant Paris
  
  
   or
  
  
   many other possible queries
  
  
   First I want to do a facet search on brand and categories, to find out
  which
   case is the current case.
  
  
   http://localhost:8080/solr /select?q=restaurant  paris
   facet=truefacet.field=BRAND facet.field=CATEGORY
  
   and get an answer like
  
   lst name=facet_fields
  
   lst name=CATEGORY
  
   int name=Restaurant598/int
  
   int name=Restaurant Hotel451/int
  
  
  
   Then I want to send a request with fq= CATEGORY: Restaurant and still
 get
   answers with CATEGORY= Restaurant Hotel.
  
  
  
   One solution would be to modify the data to add a new document every
 time
  we
   have a new category, so a POI with three different categories would be
  index
   three times, each time with a different category.
  
  
   But I was wondering if there was another way around.
  
  
  
   Thanks again,
  
   Elisabeth
  
  
   2011/4/28 Stefan Matheis matheis.ste...@googlemail.com
  
   Hi Elisabeth,
  
   that's not what FilterQueries are made for :) What 

Re: OutOfMemoryError with DIH loading 140.000 documents

2011-05-02 Thread Erick Erickson
What do you have your commit parameters set to in
solrconfig.xml? I suspect you can make this all work by
reducing the ram threshold in the config file

Best
Erick

On Mon, May 2, 2011 at 4:55 AM, Zoltán Altfatter altfatt...@gmail.com wrote:
 Hi,

 I receive OutOfMemoryError with Solr 3.1 when loading around 140.000
 document with DIH.
 I have set the -Xmx to 1536M. Weird that I cannot give more heap memory.
 With -Xmx2G the process doesn't start.

 Do you know why can't I set 2G -Xmx to Solr 3.1?

 With Solr 4.0 trunk it I don't receive OutOfMemoryError issue, although I
 can set the -Xmx to 2G.

 Thank you.

 Cheers,
 Zoltan



Re: solr sorting problem

2011-05-02 Thread Erick Erickson
Your query is going against the default field (defined in schema.xml). Have
you tried a fielded search?

And it would best to start a new thread for new questions see:
http://people.apache.org/~hossman/#threadhijack

Best
Erick

On Mon, May 2, 2011 at 3:46 AM, Pratik pratik_dwiv...@yahoo.com wrote:
 Hi,
 Thanks for your reply.

 I'm, using commit=true while indexing, and it does index the records and
 show the number of records indexed.
 The problem is that search yields 0 records ( numFound=0 ).

 e.g.
 responselst name=responseHeaderint name=status0/intint
 name=QTime0/intlst name=paramsstr name=indenton/strstr
 name=qappl/str/lst/lstresult name=response numFound=0
 start=0//response

 There are some entries for spell checking in my schema too.
 e.g.
 field name=f_spell_en type=textSpell_en/
 copyField source=foodDescUS dest=f_spell_en/

 The Search URL is something like:-
 http://localhost:8983/solr/select/?q=appleindent=on
 http://localhost:8983/solr/select/?q=appleversion=2.2start=0rows=10indent=on

 Cache could not be a problem as it did not fetch any records from the very
 begining.

 So, basically it does not fetch any documents/records whereas it does index
 them.

 Thanks
 Pratik

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-sorting-problem-tp486144p2889075.html
 Sent from the Solr - User mailing list archive at Nabble.com.



updates not reflected in solr admin

2011-05-02 Thread Mike Sokolov
This is in 1.4 - we push updates via SolrJ; our application sees the 
updates, but when we use the solr admin screens to run test queries, or 
use Luke to view the schema and field values, it sees the database in 
its state prior to the commit.  I think eventually this seems to 
propagate, but I'm not clear how often since we generally restart the 
(tomcat) server in order to get the new commit to be visible.


I saw a comment recently (from Lance) that there is (annoying) HTTP 
caching enabled by default in solrconfig.xml.  Does this sound like 
something that would be caused by that cache?  If so, I'd probably want 
to disable it.   Does that affect performance of queries run via SolrJ?  
Also: why isn't that cache flushed by a commit?  Seems weird...


--
Michael Sokolov
Engineering Director
www.ifactory.com



Negative boost

2011-05-02 Thread Brian Lamb
Hi all,

I understand that the only way to simulate a negative boost is to positively
boost the inverse. I have looked at
http://wiki.apache.org/solr/SolrRelevancyFAQ but I think I am missing
something on the formatting of my query. I am using:

http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1

In this case, I am trying to search for records about dog but to put
records containing Sheltie closer to the bottom as I am not really
interested in that. However, the following queries:

http://localhost:8983/solr/search?q=dog
http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1

Return the exact same set of results with a record about a Sheltie as the
top result each time. What am I doing incorrectly?

Thanks,

Brian Lamb


Re: Highlighting words with non-ascii chars

2011-05-02 Thread Peter Wolanin
Does your servlet container have the URI encoding set correctly, e.g.
URIEncoding=UTF-8 for tomcat6?

http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Older versions of Jetty use ISO-8859-1 as the default URI encoding,
but jetty 6 should use UTF-8 as default:

http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings

-Peter

On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka pavel.kuka...@seznam.cz wrote:
 Hello,

        I've hit a (probably trivial) roadblock I don't know how to overcome 
 with Solr 3.1:
 I have a document with common fields (title, keywords, content) and I'm
 trying to use highlighting.
        With queries using ASCII characters there is no problem; it works 
 smoothly. However,
 when I search using a czech word including non-ascii chars (like slovíčko 
 for example - 
 http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dkoversion=2.2start=0rows=10indent=onhl=onhl.fl=*),
  the document is found, but
 the response doesn't contain the highlighted snippet in the highlighting node 
 - there is only an
 empty node - like this:
 **
 .
 .
 .
 lst name=highlighting
  lst name=2009/
 /lst
 


 When searching for the other keyword ( 
 http://localhost:8983/solr/select/?q=slovoversion=2.2start=0rows=10indent=onhl=onhl.fl=*),
  the resulting response is fine - like this:
 
 lst name=highlighting
  lst name=2009
 arr name=user_keywords
      strslovamp;#237;amp;#269;ko lt;em 
 id=highlightinggt;slovolt;/emgt;/str
    /arr
  /lst
 /lst

 

 Did anyone come accross this problem?
 Cheers,
 Pavel






-- 
Peter M. Wolanin, Ph.D.      : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 978-296-5247

Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;


Re: updates not reflected in solr admin

2011-05-02 Thread Ahmet Arslan


This is in 1.4 - we push updates via SolrJ; our application sees the updates, 
but when we use the solr admin screens to run test queries, or use Luke to view 
the schema and field values, it sees the database in its state prior to the 
commit.  I think eventually this seems to propagate, but I'm not clear how 
often since we generally restart the (tomcat) server in order to get the new 
commit to be visible.


You need to issue a commit from HTTP interface to see the changes made by 
embedded solr server. 
solr/update?commit=true



Re: Negative boost

2011-05-02 Thread Ahmet Arslan


I understand that the only way to simulate a negative boost is to positively
boost the inverse. I have looked at
http://wiki.apache.org/solr/SolrRelevancyFAQ but I think I am missing
something on the formatting of my query. I am using:

http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1

In this case, I am trying to search for records about dog but to put
records containing Sheltie closer to the bottom as I am not really
interested in that. However, the following queries:

http://localhost:8983/solr/search?q=dog
http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1

Return the exact same set of results with a record about a Sheltie as the
top result each time. What am I doing incorrectly?


bq parameter is specific to dismax query parser. 
If you want to benefit from bq, you need to use defType=dismax as well as 
other dismax's parameters
http://wiki.apache.org/solr/DisMaxQParserPlugin



Re: updates not reflected in solr admin

2011-05-02 Thread Mike Sokolov
Thanks - we are issuing a commit via SolrJ; I think that's the same 
thing, right?  Or are you saying really we need to do a separate commit 
(via HTTP) to update the admin console's view?


-Mike

On 05/02/2011 11:49 AM, Ahmet Arslan wrote:


This is in 1.4 - we push updates via SolrJ; our application sees the updates, 
but when we use the solr admin screens to run test queries, or use Luke to view 
the schema and field values, it sees the database in its state prior to the 
commit.  I think eventually this seems to propagate, but I'm not clear how 
often since we generally restart the (tomcat) server in order to get the new 
commit to be visible.


You need to issue a commit from HTTP interface to see the changes made by 
embedded solr server.
solr/update?commit=true

   


Re: updates not reflected in solr admin

2011-05-02 Thread Ahmet Arslan




Thanks - we are issuing a commit via SolrJ; I think that's the same 
thing, right?  Or are you saying really we need to do a separate commit 
(via HTTP) to update the admin console's view?

Yes separate commit is needed.


RE: querying in Java

2011-05-02 Thread Saler, Jeff
This worked.  Thank you.  

What if I want to query for two or more field's values.  For example:
Field  color  dayOf Week
 
Value   blue   Tuesday

I have tried a query string of blueTuesday, with no success.



-Original Message-
From: Anuj Kumar [mailto:anujs...@gmail.com] 
Sent: Friday, April 29, 2011 2:10 PM
To: solr-user@lucene.apache.org
Subject: Re: querying in Java

Hi Jeff,

In that case, you can create a new index field (set indexed to true and
stored to false) and copy all your fields to it using copyField.
Also make this new field as your default search field.

This will handle your case.

Regards,
Anuj

On Fri, Apr 29, 2011 at 11:36 PM, Saler, Jeff jsa...@ball.com wrote:

 Thanks for the reply.  What I want is for the query to search all
fields
 for the specified value.

 -Original Message-
 From: Anuj Kumar [mailto:anujs...@gmail.com]
 Sent: Friday, April 29, 2011 1:51 PM
 To: solr-user@lucene.apache.org
 Subject: Re: querying in Java

 Hi Jeff,

 In that case, it will query w.r.t default field. What is your default
 search
 field in the schema?

 Regards,
 Anuj

 On Fri, Apr 29, 2011 at 11:10 PM, Saler, Jeff jsa...@ball.com wrote:

  Is there any way to query for data that is in any field, i.e. not
 using
  a specific field name?
 
 
 
  For example, when I use the following statements:
 
 
 
 SolrQuery  query = new SolrQuery();
 
 Query.setQuery(ANALYST:John Schummers);
 
   QueryResponse  rsp = server.query(query);
 
 
 
 
 
  I get the documents I'm looking for.
 
 
 
  But I would like to get the same set of documents without using the
  specific ANALYST field name.
 
  I have tried using just Schummers as the query, but no documents
are
  returned.
 
  The ANALYST field is an indexed field.
 
 
 
 
 
 
  This message and any enclosures are intended only for the addressee.
   Please
  notify the sender by email if you are not the intended recipient.
If
 you
  are
  not the intended recipient, you may not use, copy, disclose, or
 distribute
  this
  message or its contents or enclosures to any other person and any
such
  actions
  may be unlawful.  Ball reserves the right to monitor and review all
  messages
  and enclosures sent to or from this email address.



 This message and any enclosures are intended only for the addressee.
  Please
 notify the sender by email if you are not the intended recipient.  If
you
 are
 not the intended recipient, you may not use, copy, disclose, or
distribute
 this
 message or its contents or enclosures to any other person and any such
 actions
 may be unlawful.  Ball reserves the right to monitor and review all
 messages
 and enclosures sent to or from this email address.




This message and any enclosures are intended only for the addressee.  Please  
notify the sender by email if you are not the intended recipient.  If you are  
not the intended recipient, you may not use, copy, disclose, or distribute this 
 
message or its contents or enclosures to any other person and any such actions  
may be unlawful.  Ball reserves the right to monitor and review all messages  
and enclosures sent to or from this email address.


Question concerning the updating of my solr index

2011-05-02 Thread Greg Georges
Hello all,

I have integrated Solr into my project with success. I use a dataimporthandler 
to first import the data mapping the fields to my schema.xml. I use Solrj to 
query the data and also use faceting. Works great.

The question I have now is a general one on updating the index and how it 
works. Right now, I have a thread which runs a couple of times a day to update 
the index. My index is composed of about 2 documents, and when this thread 
is run it takes the data of the 2 documents in the db, I create a 
solrdocument for each and I then use this line of code to index the index.

SolrServer server = new 
CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;);
CollectionSolrInputDocument docs = new ArrayListSolrInputDocument();

for (Iterator iterator = documents.iterator(); iterator.hasNext();) {
Document document = (Document) iterator.next();
SolrInputDocument solrDoc = SolrUtils.createDocsSolrDocument(document);
   docs.add(solrDoc);
}

UpdateRequest req = new UpdateRequest();
req.setAction(UpdateRequest.ACTION.COMMIT, false, false);
req.add(docs);
UpdateResponse rsp = req.process(server);

server.optimize();

This process takes 19 seconds, which is 10 seconds faster than my older 
solution using compass (another opensource search project we used). Is this the 
best was to update the index? If I understand correctly, an update is actually 
a delete in the index then an add. During the 19 seconds, will my index be 
locked only on the document being updated or the whole index could be locked? I 
am not in production yet with this solution, so I want to make sure my update 
process makes sense. Thanks

Greg


Re: Question concerning the updating of my solr index

2011-05-02 Thread Otis Gospodnetic
Greg,

You could use StreamingUpdateSolrServer instead of that UpdateRequest class - 
http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or 
threads adding docs to the same index simultaneously and that searches can be 
executed against the index concurrently.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Greg Georges greg.geor...@biztree.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 1:33:30 PM
 Subject: Question concerning the updating of my solr index
 
 Hello all,
 
 I have integrated Solr into my project with success. I use a  
 dataimporthandler 
to first import the data mapping the fields to my schema.xml.  I use Solrj to 
query the data and also use faceting. Works great.
 
 The  question I have now is a general one on updating the index and how it 
works.  Right now, I have a thread which runs a couple of times a day to 
update 
the  index. My index is composed of about 2 documents, and when this 
thread 
is  run it takes the data of the 2 documents in the db, I create a 
solrdocument  for each and I then use this line of code to index the index.
 
 SolrServer  server = new 
CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;);
 CollectionSolrInputDocument  docs = new ArrayListSolrInputDocument();
 
 for (Iterator iterator  = documents.iterator(); iterator.hasNext();) {
 Document document = (Document)  iterator.next();
 SolrInputDocument solrDoc =  SolrUtils.createDocsSolrDocument(document);
 docs.add(solrDoc);
 }
 
 UpdateRequest req = new  UpdateRequest();
 req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
 req.add(docs);
 UpdateResponse rsp =  req.process(server);
 
 server.optimize();
 
 This process takes 19  seconds, which is 10 seconds faster than my older 
solution using compass  (another opensource search project we used). Is this 
the 
best was to update the  index? If I understand correctly, an update is 
actually 
a delete in the index  then an add. During the 19 seconds, will my index be 
locked only on the document  being updated or the whole index could be locked? 
I 
am not in production yet  with this solution, so I want to make sure my update 
process makes sense.  Thanks
 
 Greg
 


Re: OutOfMemoryError with DIH loading 140.000 documents

2011-05-02 Thread Otis Gospodnetic
Zoltan - Solr is not preventing you from giving your JVM 2GB heap, something 
else is.  If you paste the error we may be able to help.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Zoltán Altfatter altfatt...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 4:55:06 AM
 Subject: OutOfMemoryError with DIH loading 140.000 documents
 
 Hi,
 
 I receive OutOfMemoryError with Solr 3.1 when loading around  140.000
 document with DIH.
 I have set the -Xmx to 1536M. Weird that I  cannot give more heap memory.
 With -Xmx2G the process doesn't  start.
 
 Do you know why can't I set 2G -Xmx to Solr 3.1?
 
 With Solr  4.0 trunk it I don't receive OutOfMemoryError issue, although I
 can set the  -Xmx to 2G.
 
 Thank you.
 
 Cheers,
 Zoltan



Re: querying in Java

2011-05-02 Thread Otis Gospodnetic
Jeff,

If I understand what you need, then:

yourFieldNameHere:(blue OR Tuesday)

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Saler, Jeff jsa...@ball.com
 To: solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 1:24:52 PM
 Subject: RE: querying in Java
 
 This worked.  Thank you.  
 
 What if I want to query for two or  more field's values.  For example:
 Field  color  dayOf  Week
 
 Value   blue   Tuesday
 
 I have tried a query string  of blueTuesday, with no success.
 
 
 
 -Original  Message-
 From: Anuj Kumar [mailto:anujs...@gmail.com] 
 Sent: Friday, April  29, 2011 2:10 PM
 To: solr-user@lucene.apache.org
 Subject:  Re: querying in Java
 
 Hi Jeff,
 
 In that case, you can create a new  index field (set indexed to true and
 stored to false) and copy all your  fields to it using copyField.
 Also make this new field as your default search  field.
 
 This will handle your case.
 
 Regards,
 Anuj
 
 On Fri,  Apr 29, 2011 at 11:36 PM, Saler, Jeff jsa...@ball.com wrote:
 
  Thanks  for the reply.  What I want is for the query to search  all
 fields
  for the specified value.
 
  -Original  Message-
  From: Anuj Kumar [mailto:anujs...@gmail.com]
  Sent: Friday,  April 29, 2011 1:51 PM
  To: solr-user@lucene.apache.org
   Subject: Re: querying in Java
 
  Hi Jeff,
 
  In that  case, it will query w.r.t default field. What is your default
   search
  field in the schema?
 
  Regards,
   Anuj
 
  On Fri, Apr 29, 2011 at 11:10 PM, Saler, Jeff jsa...@ball.com wrote:
 
Is there any way to query for data that is in any field, i.e. not
   using
   a specific field name?
  
  
   
   For example, when I use the following statements:
   
  
  
   SolrQuery  query = new SolrQuery();
   
   Query.setQuery(ANALYST:John Schummers);
  
 QueryResponse  rsp = server.query(query);
  
   
  
  
  
   I get the documents I'm  looking for.
  
  
  
   But I would  like to get the same set of documents without using the
   specific  ANALYST field name.
  
   I have tried using just  Schummers as the query, but no documents
 are
   returned.
   
   The ANALYST field is an indexed field.
  
   
  
  
  
  
   This  message and any enclosures are intended only for the addressee.
 Please
   notify the sender by email if you are not the  intended recipient.
 If
  you
   are
   not the  intended recipient, you may not use, copy, disclose, or
   distribute
   this
   message or its contents or enclosures  to any other person and any
 such
   actions
   may be  unlawful.  Ball reserves the right to monitor and review all
messages
   and enclosures sent to or from this email  address.
 
 
 
  This message and any enclosures are  intended only for the addressee.
   Please
  notify the sender  by email if you are not the intended recipient.  If
 you
   are
  not the intended recipient, you may not use, copy, disclose,  or
 distribute
  this
  message or its contents or enclosures to  any other person and any such
  actions
  may be unlawful.   Ball reserves the right to monitor and review all
  messages
  and  enclosures sent to or from this email address.
 
 
 
 
 This  message and any enclosures are intended only for the addressee.   
 Please  

 notify the sender by email if you are not the intended  recipient.  If you 
 are  

 not the intended recipient, you may not  use, copy, disclose, or distribute 
this  

 message or its contents or  enclosures to any other person and any such 
actions  

 may be  unlawful.  Ball reserves the right to monitor and review all messages 
  

 and enclosures sent to or from this email address.
 


Re: querying in Java

2011-05-02 Thread Anuj Kumar
Hi Jeff,

Either you can use a filter query or specify it explicitly,
like- Field:Value OR color:blue OR dayOfWeek:Tuesday
or use AND in between. It depends on what you want.

Also, if you don't want to specify AND/OR and decide on a global
declaration, then set it as the default operator in your schema.xml. For
example-

 solrQueryParser defaultOperator=OR/

Hope it helps.

Regards,
Anuj

On Mon, May 2, 2011 at 10:54 PM, Saler, Jeff jsa...@ball.com wrote:

 This worked.  Thank you.

 What if I want to query for two or more field's values.  For example:
 Field  color  dayOf Week

 Value   blue   Tuesday

 I have tried a query string of blueTuesday, with no success.



 -Original Message-
 From: Anuj Kumar [mailto:anujs...@gmail.com]
 Sent: Friday, April 29, 2011 2:10 PM
 To: solr-user@lucene.apache.org
 Subject: Re: querying in Java

 Hi Jeff,

 In that case, you can create a new index field (set indexed to true and
 stored to false) and copy all your fields to it using copyField.
 Also make this new field as your default search field.

 This will handle your case.

 Regards,
 Anuj

 On Fri, Apr 29, 2011 at 11:36 PM, Saler, Jeff jsa...@ball.com wrote:

  Thanks for the reply.  What I want is for the query to search all
 fields
  for the specified value.
 
  -Original Message-
  From: Anuj Kumar [mailto:anujs...@gmail.com]
  Sent: Friday, April 29, 2011 1:51 PM
  To: solr-user@lucene.apache.org
  Subject: Re: querying in Java
 
  Hi Jeff,
 
  In that case, it will query w.r.t default field. What is your default
  search
  field in the schema?
 
  Regards,
  Anuj
 
  On Fri, Apr 29, 2011 at 11:10 PM, Saler, Jeff jsa...@ball.com wrote:
 
   Is there any way to query for data that is in any field, i.e. not
  using
   a specific field name?
  
  
  
   For example, when I use the following statements:
  
  
  
  SolrQuery  query = new SolrQuery();
  
  Query.setQuery(ANALYST:John Schummers);
  
QueryResponse  rsp = server.query(query);
  
  
  
  
  
   I get the documents I'm looking for.
  
  
  
   But I would like to get the same set of documents without using the
   specific ANALYST field name.
  
   I have tried using just Schummers as the query, but no documents
 are
   returned.
  
   The ANALYST field is an indexed field.
  
  
  
  
  
  
   This message and any enclosures are intended only for the addressee.
Please
   notify the sender by email if you are not the intended recipient.
 If
  you
   are
   not the intended recipient, you may not use, copy, disclose, or
  distribute
   this
   message or its contents or enclosures to any other person and any
 such
   actions
   may be unlawful.  Ball reserves the right to monitor and review all
   messages
   and enclosures sent to or from this email address.
 
 
 
  This message and any enclosures are intended only for the addressee.
   Please
  notify the sender by email if you are not the intended recipient.  If
 you
  are
  not the intended recipient, you may not use, copy, disclose, or
 distribute
  this
  message or its contents or enclosures to any other person and any such
  actions
  may be unlawful.  Ball reserves the right to monitor and review all
  messages
  and enclosures sent to or from this email address.
 



 This message and any enclosures are intended only for the addressee.
  Please
 notify the sender by email if you are not the intended recipient.  If you
 are
 not the intended recipient, you may not use, copy, disclose, or distribute
 this
 message or its contents or enclosures to any other person and any such
 actions
 may be unlawful.  Ball reserves the right to monitor and review all
 messages
 and enclosures sent to or from this email address.



Re: updates not reflected in solr admin

2011-05-02 Thread Mike Sokolov

Ah - I didn't expect that.  Thank you!

On 05/02/2011 12:07 PM, Ahmet Arslan wrote:




Thanks - we are issuing a commit via SolrJ; I think that's the same
thing, right?  Or are you saying really we need to do a separate commit
(via HTTP) to update the admin console's view?

Yes separate commit is needed.
   


Re: How to combine Deduplication and Elevation

2011-05-02 Thread Chris Hostetter

: Hi I have a question. How to combine the Deduplication and Elevation
: implementations in Solr. Currently , I managed to implement either one only.

can you elaborate a bit more on what exactly you've tried and what problem 
you are facing?

the SignatureUpdateProcessorFactory (which is used for Deduplication) and 
the QueryElevation component should work just fine together -- in fact: 
one is used at index time and hte ohter at query time, so where shouldn't 
be any interaction at all...

http://wiki.apache.org/solr/Deduplication
http://wiki.apache.org/solr/QueryElevationComponent

-Hoss


Re: Too many open files exception related to solrj getServer too often?

2011-05-02 Thread Chris Hostetter

Off the top of my head, i don't know hte answers to some of your 
questions, but as to the core cause of the exception...

: 3. server.query(solrQuery) throws SolrServerException.  How can concurrent
: solr queries triggers Too many open file exception?

...bear in mind that (as i understand it) the limit on open files is 
actually a limit on open file *descriptors* which includes network 
sockets.

a google search for java.net.SocketException: Too many open files will 
give you loads of results -- it's not specific to solr.

-Hoss


RE: Question concerning the updating of my solr index

2011-05-02 Thread Greg Georges
Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I 
have configured it like this, I wonder what would the best settings be with 
20,000 docs to update? Higher or lower queue value? Higher or lower thread 
value? Thanks

Greg

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: 2 mai 2011 13:59
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index

Greg,

You could use StreamingUpdateSolrServer instead of that UpdateRequest class - 
http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or 
threads adding docs to the same index simultaneously and that searches can be 
executed against the index concurrently.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Greg Georges greg.geor...@biztree.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 1:33:30 PM
 Subject: Question concerning the updating of my solr index
 
 Hello all,
 
 I have integrated Solr into my project with success. I use a  
 dataimporthandler 
to first import the data mapping the fields to my schema.xml.  I use Solrj to 
query the data and also use faceting. Works great.
 
 The  question I have now is a general one on updating the index and how it 
works.  Right now, I have a thread which runs a couple of times a day to 
update 
the  index. My index is composed of about 2 documents, and when this 
thread 
is  run it takes the data of the 2 documents in the db, I create a 
solrdocument  for each and I then use this line of code to index the index.
 
 SolrServer  server = new 
CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;);
 CollectionSolrInputDocument  docs = new ArrayListSolrInputDocument();
 
 for (Iterator iterator  = documents.iterator(); iterator.hasNext();) {
 Document document = (Document)  iterator.next();
 SolrInputDocument solrDoc =  SolrUtils.createDocsSolrDocument(document);
 docs.add(solrDoc);
 }
 
 UpdateRequest req = new  UpdateRequest();
 req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
 req.add(docs);
 UpdateResponse rsp =  req.process(server);
 
 server.optimize();
 
 This process takes 19  seconds, which is 10 seconds faster than my older 
solution using compass  (another opensource search project we used). Is this 
the 
best was to update the  index? If I understand correctly, an update is 
actually 
a delete in the index  then an add. During the 19 seconds, will my index be 
locked only on the document  being updated or the whole index could be locked? 
I 
am not in production yet  with this solution, so I want to make sure my update 
process makes sense.  Thanks
 
 Greg
 


RE: Question concerning the updating of my solr index

2011-05-02 Thread Greg Georges
Oops, here is the code

SolrServer server = new 
StreamingUpdateSolrServer(http://localhost:8080/apache-solr-1.4.1/;, 1000, 4); 
CollectionSolrInputDocument docs = new 
ArrayListSolrInputDocument();

for (Iterator iterator = documents.iterator(); 
iterator.hasNext();) {
Document document = (Document) iterator.next();

SolrInputDocument solrDoc = 
SolrUtils.createDocsSolrDocument(document); 
docs.add(solrDoc);
}

   server.add(docs);
   server.commit();
   server.optimize();

Greg

-Original Message-
From: Greg Georges [mailto:greg.geor...@biztree.com] 
Sent: 2 mai 2011 14:44
To: solr-user@lucene.apache.org
Subject: RE: Question concerning the updating of my solr index

Ok I had seen this in the wiki, performance has gone from 19 seconds to 13. I 
have configured it like this, I wonder what would the best settings be with 
20,000 docs to update? Higher or lower queue value? Higher or lower thread 
value? Thanks

Greg

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: 2 mai 2011 13:59
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index

Greg,

You could use StreamingUpdateSolrServer instead of that UpdateRequest class - 
http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr
Your index won't be locked in the sense that you could have multiple apps or 
threads adding docs to the same index simultaneously and that searches can be 
executed against the index concurrently.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Greg Georges greg.geor...@biztree.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 1:33:30 PM
 Subject: Question concerning the updating of my solr index
 
 Hello all,
 
 I have integrated Solr into my project with success. I use a  
 dataimporthandler 
to first import the data mapping the fields to my schema.xml.  I use Solrj to 
query the data and also use faceting. Works great.
 
 The  question I have now is a general one on updating the index and how it 
works.  Right now, I have a thread which runs a couple of times a day to 
update 
the  index. My index is composed of about 2 documents, and when this 
thread 
is  run it takes the data of the 2 documents in the db, I create a 
solrdocument  for each and I then use this line of code to index the index.
 
 SolrServer  server = new 
CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;);
 CollectionSolrInputDocument  docs = new ArrayListSolrInputDocument();
 
 for (Iterator iterator  = documents.iterator(); iterator.hasNext();) {
 Document document = (Document)  iterator.next();
 SolrInputDocument solrDoc =  SolrUtils.createDocsSolrDocument(document);
 docs.add(solrDoc);
 }
 
 UpdateRequest req = new  UpdateRequest();
 req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
 req.add(docs);
 UpdateResponse rsp =  req.process(server);
 
 server.optimize();
 
 This process takes 19  seconds, which is 10 seconds faster than my older 
solution using compass  (another opensource search project we used). Is this 
the 
best was to update the  index? If I understand correctly, an update is 
actually 
a delete in the index  then an add. During the 19 seconds, will my index be 
locked only on the document  being updated or the whole index could be locked? 
I 
am not in production yet  with this solution, so I want to make sure my update 
process makes sense.  Thanks
 
 Greg
 


Re: when to change rows param?

2011-05-02 Thread Chris Hostetter

: I thought that injecting the rows param in the query-component would 
: have been enough (from the limits param my client is giving). But it 
: seems not to be the case.

As i tried to explain before: the details matter.  exactly where in the 
code you tried to do this and how you went about it is important to 
understanding why it might not have affected the results in the way you 
expect.

SearchComponents are ordered, and multiple passes are made over each 
component in order, and each component has the opportunity to access the 
request params in a variety of ways, etc...

So w/o knowing exactly what you changed, we can't really speculate why 
some other code isn't using the new value (particularly since i don't 
think you ever actaully told use *which* other code isn't getting the new 
value)


: 
: paul
: 
: 
: Le 12 avr. 2011 à 02:07, Chris Hostetter a écrit :
: 
:  
:  Paul: can you elaborate a little bit on what exactly your problem is?
:  
:  - what is the full component list you are using?
:  - how are you changing the param value (ie: what does the code look like)
:  - what isn't working the way you expect?
:  
:  : I've been using my own QueryComponent (that extends the search one) 
:  : successfully to rewrite web-received parameters that are sent from the 
:  : (ExtJS-based) javascript client. This allows an amount of 
:  : query-rewriting, that's good. I tried to change the rows parameter there 
:  : (which is limit in the query, as per the underpinnings of ExtJS) but 
:  : it seems that this is not enough.
:  : 
:  : Which component should I subclass to change the rows parameter?
:  
:  -Hoss
: 
: 

-Hoss

Re: Question concerning the updating of my solr index

2011-05-02 Thread Otis Gospodnetic
Greg,

I believe the point of SUSS is that you can just add docs to it one by one, so 
that SUSS can asynchronously send them to the backend Solr instead of you 
batching the docs.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Greg Georges greg.geor...@biztree.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 2:45:40 PM
 Subject: RE: Question concerning the updating of my solr index
 
 Oops, here is the code
 
 SolrServer server = new  
StreamingUpdateSolrServer(http://localhost:8080/apache-solr-1.4.1/;, 1000, 
4); 

 CollectionSolrInputDocument  docs = new 
ArrayListSolrInputDocument();
  
 for (Iterator  iterator = documents.iterator(); iterator.hasNext();) {
  Document document = (Document)  iterator.next();
 
 SolrInputDocument  solrDoc = 
SolrUtils.createDocsSolrDocument(document);

  docs.add(solrDoc);
  }
 
server.add(docs);
 server.commit();
 server.optimize();
 
 Greg
 
 -Original Message-
 From: Greg  Georges [mailto:greg.geor...@biztree.com] 
 Sent: 2  mai 2011 14:44
 To: solr-user@lucene.apache.org
 Subject:  RE: Question concerning the updating of my solr index
 
 Ok I had seen this  in the wiki, performance has gone from 19 seconds to 13. 
 I 
have configured it  like this, I wonder what would the best settings be with 
20,000 docs to update?  Higher or lower queue value? Higher or lower thread 
value?  Thanks
 
 Greg
 
 -Original Message-
 From: Otis Gospodnetic  [mailto:otis_gospodne...@yahoo.com] 
 Sent: 2 mai 2011 13:59
 To: solr-user@lucene.apache.org
 Subject:  Re: Question concerning the updating of my solr index
 
 Greg,
 
 You  could use StreamingUpdateSolrServer instead of that UpdateRequest class 
 - 

 http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr
 Your  index won't be locked in the sense that you could have multiple apps or 
 threads adding docs to the same index simultaneously and that searches can  
 be 

 executed against the index concurrently.
 
 Otis
 
 Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message  
  From: Greg Georges greg.geor...@biztree.com
   To: solr-user@lucene.apache.org  solr-user@lucene.apache.org
   Sent: Mon, May 2, 2011 1:33:30 PM
  Subject: Question concerning the  updating of my solr index
  
  Hello all,
  
  I have  integrated Solr into my project with success. I use a  
dataimporthandler 

 to first import the data mapping the fields to my schema.xml.  I  use Solrj 
 to 

 query the data and also use faceting. Works great.
  
  The  question I have now is a general one on updating the index  and how it 
 works.  Right now, I have a thread which runs a couple  of times a day to 
update 

 the  index. My index is composed of about  2 documents, and when this 
thread 

 is  run it takes the data of  the 2 documents in the db, I create a 
 solrdocument  for each  and I then use this line of code to index the index.
  
   SolrServer  server = new 
 CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;);
   CollectionSolrInputDocument  docs = new  ArrayListSolrInputDocument();
  
  for (Iterator  iterator  = documents.iterator(); iterator.hasNext();) {
  Document  document = (Document)  iterator.next();
  SolrInputDocument solrDoc  =  SolrUtils.createDocsSolrDocument(document);
   docs.add(solrDoc);
  }
  
  UpdateRequest req = new  UpdateRequest();
   req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
   req.add(docs);
  UpdateResponse rsp =  req.process(server);
  
  server.optimize();
  
  This process takes 19   seconds, which is 10 seconds faster than my older 
 solution using  compass  (another opensource search project we used). Is 
 this 
the 

 best was to update the  index? If I understand correctly, an update  is 
actually 

 a delete in the index  then an add. During the 19  seconds, will my index be 
 locked only on the document  being  updated or the whole index could be 
locked? I 

 am not in production  yet  with this solution, so I want to make sure my 
update 

 process  makes sense.  Thanks
  
  Greg
  
 


Re: Why are they different?

2011-05-02 Thread Chris Hostetter

: The above code, if I start the server(tomcat) inside eclipse, it throws 
: SolrException : Internal Server Error; but if I start the server 
: outside eclipse, for instance, run startup.bat in tomcat's bin 
: directory, it runs successfully. I really don't understand Why they are 
: different.

Have you checked the logs?

My guess is that when you use eclipse, the server is not starting up 
properly at all.  Possible not finding the Solr Home directory?


-Hoss


RE: Question concerning the updating of my solr index

2011-05-02 Thread Greg Georges
Yeah you are right, I have changed that to add a document and not a list of 
documents. Still works pretty fast, I will continue to test settings to see if 
I can tweak it further. Thanks

Greg

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: 2 mai 2011 14:56
To: solr-user@lucene.apache.org
Subject: Re: Question concerning the updating of my solr index

Greg,

I believe the point of SUSS is that you can just add docs to it one by one, so 
that SUSS can asynchronously send them to the backend Solr instead of you 
batching the docs.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Greg Georges greg.geor...@biztree.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 2:45:40 PM
 Subject: RE: Question concerning the updating of my solr index
 
 Oops, here is the code
 
 SolrServer server = new  
StreamingUpdateSolrServer(http://localhost:8080/apache-solr-1.4.1/;, 1000, 
4); 

 CollectionSolrInputDocument  docs = new 
ArrayListSolrInputDocument();
  
 for (Iterator  iterator = documents.iterator(); iterator.hasNext();) {
  Document document = (Document)  iterator.next();
 
 SolrInputDocument  solrDoc = 
SolrUtils.createDocsSolrDocument(document);

  docs.add(solrDoc);
  }
 
server.add(docs);
 server.commit();
 server.optimize();
 
 Greg
 
 -Original Message-
 From: Greg  Georges [mailto:greg.geor...@biztree.com] 
 Sent: 2  mai 2011 14:44
 To: solr-user@lucene.apache.org
 Subject:  RE: Question concerning the updating of my solr index
 
 Ok I had seen this  in the wiki, performance has gone from 19 seconds to 13. 
 I 
have configured it  like this, I wonder what would the best settings be with 
20,000 docs to update?  Higher or lower queue value? Higher or lower thread 
value?  Thanks
 
 Greg
 
 -Original Message-
 From: Otis Gospodnetic  [mailto:otis_gospodne...@yahoo.com] 
 Sent: 2 mai 2011 13:59
 To: solr-user@lucene.apache.org
 Subject:  Re: Question concerning the updating of my solr index
 
 Greg,
 
 You  could use StreamingUpdateSolrServer instead of that UpdateRequest class 
 - 

 http://search-lucene.com/?q=StreamingUpdateSolrServer+fc_project=Solr
 Your  index won't be locked in the sense that you could have multiple apps or 
 threads adding docs to the same index simultaneously and that searches can  
 be 

 executed against the index concurrently.
 
 Otis
 
 Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message  
  From: Greg Georges greg.geor...@biztree.com
   To: solr-user@lucene.apache.org  solr-user@lucene.apache.org
   Sent: Mon, May 2, 2011 1:33:30 PM
  Subject: Question concerning the  updating of my solr index
  
  Hello all,
  
  I have  integrated Solr into my project with success. I use a  
dataimporthandler 

 to first import the data mapping the fields to my schema.xml.  I  use Solrj 
 to 

 query the data and also use faceting. Works great.
  
  The  question I have now is a general one on updating the index  and how it 
 works.  Right now, I have a thread which runs a couple  of times a day to 
update 

 the  index. My index is composed of about  2 documents, and when this 
thread 

 is  run it takes the data of  the 2 documents in the db, I create a 
 solrdocument  for each  and I then use this line of code to index the index.
  
   SolrServer  server = new 
 CommonsHttpSolrServer(http://localhost:8080/apache-solr-1.4.1/;);
   CollectionSolrInputDocument  docs = new  ArrayListSolrInputDocument();
  
  for (Iterator  iterator  = documents.iterator(); iterator.hasNext();) {
  Document  document = (Document)  iterator.next();
  SolrInputDocument solrDoc  =  SolrUtils.createDocsSolrDocument(document);
   docs.add(solrDoc);
  }
  
  UpdateRequest req = new  UpdateRequest();
   req.setAction(UpdateRequest.ACTION.COMMIT, false,  false);
   req.add(docs);
  UpdateResponse rsp =  req.process(server);
  
  server.optimize();
  
  This process takes 19   seconds, which is 10 seconds faster than my older 
 solution using  compass  (another opensource search project we used). Is 
 this 
the 

 best was to update the  index? If I understand correctly, an update  is 
actually 

 a delete in the index  then an add. During the 19  seconds, will my index be 
 locked only on the document  being  updated or the whole index could be 
locked? I 

 am not in production  yet  with this solution, so I want to make sure my 
update 

 process  makes sense.  Thanks
  
  Greg
  
 


DataImportHandler on 2 tables

2011-05-02 Thread Greg Georges
Hello all,

I have a system where I have a dataimporthandler defined for one table in my 
database. I need to also index data from another table, so therefore I will 
need another index to search on. Does this mean I must configure another solr 
instance (another schema.xml file, dataimporthandler sql file, etc)? Do I need 
another solr core for this? Thanks

Greg


Re: DataImportHandler on 2 tables

2011-05-02 Thread lboutros
Do you want to search on the datas from the tables together or seperately ?
Is there a join between the two tables ?

Ludovic.

2011/5/2 Greg Georges [via Lucene] 
ml-node+2891256-222073995-383...@n3.nabble.com

 Hello all,

 I have a system where I have a dataimporthandler defined for one table in
 my database. I need to also index data from another table, so therefore I
 will need another index to search on. Does this mean I must configure
 another solr instance (another schema.xml file, dataimporthandler sql file,
 etc)? Do I need another solr core for this? Thanks

 Greg


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891256.html
  To start a new topic under Solr - User, email
 ml-node+472068-1765922688-383...@n3.nabble.com
 To unsubscribe from Solr - User, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=.




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891272.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DataImportHandler on 2 tables

2011-05-02 Thread Greg Georges
No, the data has no relationship between each other, they are both independant 
with no joins. I want to search separately

Greg

-Original Message-
From: lboutros [mailto:boutr...@gmail.com] 
Sent: 2 mai 2011 16:29
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler on 2 tables

Do you want to search on the datas from the tables together or seperately ?
Is there a join between the two tables ?

Ludovic.

2011/5/2 Greg Georges [via Lucene] 
ml-node+2891256-222073995-383...@n3.nabble.com

 Hello all,

 I have a system where I have a dataimporthandler defined for one table in
 my database. I need to also index data from another table, so therefore I
 will need another index to search on. Does this mean I must configure
 another solr instance (another schema.xml file, dataimporthandler sql file,
 etc)? Do I need another solr core for this? Thanks

 Greg


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891256.html
  To start a new topic under Solr - User, email
 ml-node+472068-1765922688-383...@n3.nabble.com
 To unsubscribe from Solr - User, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=.




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891272.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler on 2 tables

2011-05-02 Thread lboutros
ok, so It seems you should create a new index and core as you said.

see here for the management :

http://wiki.apache.org/solr/CoreAdmin

But it seems that is a problem for you. Is it ?

Ludovic.


2011/5/2 Greg Georges [via Lucene] 
ml-node+2891277-472183207-383...@n3.nabble.com

 No, the data has no relationship between each other, they are both
 independant with no joins. I want to search separately

 Greg




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891316.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DataImportHandler on 2 tables

2011-05-02 Thread Greg Georges
No it is not a problem, just wanted to confirm my question before looking into 
solr cores more closely. Thanks for your advice and confirmation

Greg

-Original Message-
From: lboutros [mailto:boutr...@gmail.com] 
Sent: 2 mai 2011 16:43
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler on 2 tables

ok, so It seems you should create a new index and core as you said.

see here for the management :

http://wiki.apache.org/solr/CoreAdmin

But it seems that is a problem for you. Is it ?

Ludovic.


2011/5/2 Greg Georges [via Lucene] 
ml-node+2891277-472183207-383...@n3.nabble.com

 No, the data has no relationship between each other, they are both
 independant with no joins. I want to search separately

 Greg




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-on-2-tables-tp2891256p2891316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fq parameter with partial value

2011-05-02 Thread Jonathan Rochkind
So if you have a field that IS tokenized, regardless of what it's 
called, then when you send My Great Restaurant to it for _indexing_, 
it gets _tokenized upon indexing_ to seperate tokens:  My, Great, 
Restaurant.  Depending on what other analysis you have, it may get 
further analyzed, perhaps to: my, great, restaurant.


You don't need to seperate into tokens yourself before sending it to 
Solr for indexing, if you define the field using a tokenizer, Solr will 
do that when you index.  Because this is a VERY common thing to do with 
Solr; pretty much any field that you want to be effectively searchable 
you have Solr tokenize like this.


Because Solr pretty much always matches on individual tokens, that's the 
fundamental way Solr works.
Those seperate tokens is what allows you to SEARCH on the field, and get 
a match on my or on restaurant.   If the field were non-tokenized, 
you'd ONLY get a hit if the user entered My Great Restaurant (and 
really not even then unless you take other actions, because of the way 
Solr query parsers work you'll have trouble getting ANY hits to a 
user-entered search with the 'lucene' or 'dismax' query parsers if you 
don't tokenize).


That tokenized filed won't facet very well though -- if you facetted on 
a tokenized field with that example entered in it, you'll get a facet 
my with that item in it, and another facet great with that item in 
it, and another facet restuarant with that item in it.


Which is why you likely want to use a seperate _untokenized_ field for 
facetting. Which is why you end up wanting/needing two seperate fields 
-- one that is tokenized for searching, and one that is not tokenized 
(and usually not analyzed at all) for facetting.


Hope this helps.

On 5/2/2011 2:43 AM, elisabeth benoit wrote:

I'm a bit confused here.

What is the difference between CATEGORY and CATEGORY_TOKENIZED if I just do
a copyField from what field to another? And how can I search only for
Restaurant (fq= CATEGORY_TOKENIZED: Restaurant). Shouldn't I have something
like
field name=CATEGORY_TOKENIZEDHotel/field, if I want this to work. And
from what I understand, this means I should do more then just copy
field name=*CATEGORY*Restaurant Hotel/field
to CATEGORY_TOKENIZED.

Thanks,
Elisabeth


2011/4/28 Erick Ericksonerickerick...@gmail.com


See below:


On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit
elisaelisael...@gmail.com  wrote:

yes, the multivalued field is not broken up into tokens.

so, if I understand well what you mean, I could have

a field CATEGORY with  multiValued=true
a field CATEGORY_TOKENIZED with  multiValued= true

and then some POI

field name=NAMEPOI_Name/field
...
field name=*CATEGORY*Restaurant Hotel/field
field name=CATEGORY_TOKENIZEDRestaurant/field
field name=CATEGORY_TOKENIZEDHotel/field

[EOE] If the above is the document you're sending, then no. The
document would be indexed with
field name=*CATEGORY*Restaurant Hotel/field
field name=CATEGORY_TOKENIZEDRestaurant Hotel/field


Or even just:
field name=*CATEGORY*Restaurant Hotel/field

and set up acopyField  to copy the value from CATEGORY to
CATEGORY_TOKENIZED.

The multiValued part comes from:
And a single POIs might have different categories so your document could
have
which would look like:
field name=CATEGORYRestaruant Hotel/field
field name=CATEGORYHealth Spa/field
field name=CATEGORYDance Hall/field

and your document would be counted for each of those entries while searches
against CATEGORY_TOKENIZED would match things like dance spa etc.

But do notice that if you did NOT want searching for restaurant hall
(no quotes),
to match then you could do proximity searches for less than your
increment gap. e.g.
(this time with the quotes) would be restaurant hall~50, which would then
NOT match if your increment gap were 100.

Best
Erick



do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.

But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?

Best regards
Elisabeth


2011/4/28 Erick Ericksonerickerick...@gmail.com


So, I assume your CATEGORY field is multiValued but each value is not
broken up into tokens, right? If that's the case, would it work to have

a

second field CATEGORY_TOKENIZED and run your fq against that
field instead?

You could have this be a multiValued field with an increment gap if you
wanted
to prevent matches across separate entries and have your fq do a

proximity

search where the proximity was less than the increment gap

Best
Erick

On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
elisaelisael...@gmail.com  wrote:

Hi Stefan,

Thanks for answering.

In more details, my problem is the following. I'm working on searching
points of interest (POIs), which can be hotels, restaurants, plumbers,
psychologists, etc.

Those POIs can be identified among other things  by categories or by

brand.

And a single POIs might have different categories (no maximum number).

User

might enter a query like


McDonald’s Paris


or


Restaurant Paris


or



Indexing multiple languages

2011-05-02 Thread PeterKerk
I have categories facets. Currently on in 1 language, but since my site is
multilanguage, I need to index them in multiple languages.

My table looks like this:

[music_categories]
id  int Unchecked
title   nvarchar(50)Unchecked
title_ennvarchar(50)Unchecked
title_nlnvarchar(50)Unchecked




In my data-config.xml I have this, only for 1 language:

entity name=artist_category query=select categoryid from
artist_categories where objectid=${artist.id}
entity name=category query=select title from music_categories where 
id
= '${artist_category.categoryid}'
field name=categories column=title  /
/entity
/entity


Now, the only way I can imagine indexing multiple languages is by
duplicating these lines:

entity name=artist_category_en query=select categoryid from
artist_categories where objectid=${artist.id}
entity name=category_en query=select title_en from music_categories
where id = '${artist_category.categoryid}'
field name=categories_en column=title_en  /
/entity
/entity

entity name=artist_category_nl query=select categoryid from
artist_categories where objectid=${artist.id}
entity name=category_nl query=select title_nl from music_categories
where id = '${artist_category.categoryid}'
field name=categories_nl column=title_nl  /
/entity
/entity


Is there a better way, e.g. where I can do some sort of parametering like
{lang] or something?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-multiple-languages-tp2891546p2891546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Formatted date/time in long field and javabinRW exception

2011-05-02 Thread Chris Hostetter

: Any thoughts on this one? Why does Solr output a string in a long field with 
: XMLResponseWriter but fails doing so (as it should) with the javabin format?

performance.

the XML Response writer doesn't make any attempt to validate data from 
the index on the way out, the stored value in the index is a string, and 
the stream it's writing to accepts strings, so it writes the value as a 
string wrapped in long tags.

The Binary response writer on the other hand streams strings differnetly 
then longs, so it does care when it encounters the inforrect type, and it 
errors.

Bottom line: if your schema.xml is not consistent with your index, all 
bets are off as to what the behavior will be.



-Hoss


Re: Formatted date/time in long field and javabinRW exception

2011-05-02 Thread Markus Jelsma

 : Any thoughts on this one? Why does Solr output a string in a long field
 : with XMLResponseWriter but fails doing so (as it should) with the
 : javabin format?
 
 performance.
 
 the XML Response writer doesn't make any attempt to validate data from
 the index on the way out, the stored value in the index is a string, and
 the stream it's writing to accepts strings, so it writes the value as a
 string wrapped in long tags.
 
 The Binary response writer on the other hand streams strings differnetly
 then longs, so it does care when it encounters the inforrect type, and it
 errors.
 
 Bottom line: if your schema.xml is not consistent with your index, all
 bets are off as to what the behavior will be.

That sounds quite reasonable indeed. But i don't understand why Solr doesn't 
throw an exception when i actually index a string in a long fieldType while i 
do remember getting some number formatting exception when pushing strings to 
an integer fieldType.

With the current set up i can send a properly formatted date to a long 
fieldType, which should, in my opionion, punish me with an exception.

 
 
 
 -Hoss


Re: Avoiding corrupted index

2011-05-02 Thread Chris Hostetter

: First, I tried the scripts provided in the Solr distribution without success
...
: And that's true : there is no /opt/apache-solr-1.4.1/src/bin/scripts-util
: but a /opt/apache-solr-1.4.1/src/scripts/scripts-util
: Is this normal to distribute the scripts with a bad path ?

it looks like you are trying to run the scripts from the src directory 
of the distro ... they are ment to be installed in a bin directory 
in your solr home dir (so they can locate the default data dir, etc...)

If you haven't seen them already...

http://wiki.apache.org/solr/CollectionDistribution
http://wiki.apache.org/solr/SolrCollectionDistributionScripts

: Then I discovered that these utility scripts were not distributed anymore
: with the version 3.1.0 : were they not reliable ? can we get corrupted
: backups with this scripts ?

no, as far as i know they work great.

they were not included in the *binary* distributions of Solr, but they 
were most certianly included in the *source* distributions ... i think 
that was actually an oversight ... 3.1 is hte first time we had a binary 
distibution, and there's no reason i know of why they shouldn't have been 
in both.

(in general, these scripts have fallen out of favor because they aren't as 
portable or as easy to test as the java based replication, so they are 
easy to forget)


-Hoss


Re: DataImportHandler on 2 tables

2011-05-02 Thread Otis Gospodnetic
Greg,

1 instance with 2 cores, each with their own schema, solrconfig, etc. (the conf 
dir stuff).

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Greg Georges greg.geor...@biztree.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 4:23:43 PM
 Subject: DataImportHandler on 2 tables
 
 Hello all,
 
 I have a system where I have a dataimporthandler defined for  one table in my 
database. I need to also index data from another table, so  therefore I will 
need another index to search on. Does this mean I must  configure another solr 
instance (another schema.xml file, dataimporthandler sql  file, etc)? Do I 
need 
another solr core for this? Thanks
 
 Greg
 


Re: Dismax Minimum Match/Stopwords Bug

2011-05-02 Thread Chris Hostetter

: However, is there an actual fix in the 3.1 eDisMax parser which solves 
: the problem for real? Cannot find a JIRA issue for it.

edismax uses the same query structure as dismax, which means it's not 
possible to fix anything here ... it's how the query parsers work.

each word from the query string is analyzed by each field in the qf, 
and the result is used as a query on the word in the field.  The 
individual clauses for each word are aggregated into a 
DisjunctionMaxQuery, and the set of DisjunctionMaxQueries are then 
combined into a BooleanQuery (with the appropriate minNrShouldMatch set)

if a word from the input produces no output from the analyzers of *any* 
of the of fields, then the resulting DisjunctionMaxQuery is empty and 
droped from the final BooleanQuery ... so if a word in the query string 
is stop word for *every* field in the qf, there is no clause.  but if 
*any* field in the qf produces a term for it, then there is a 
DisjunctionMaxQuery for that word added to hte main BooleanQuery.

As i've said many times: this isn't a bug, it's fundemental point of the 
parser and the structure of the query.

The best solution for people who get bit by this (in my opinion) is not 
to give up on stop words -- if you want to use stop words, by all means 
use stop words.  BUT! You must use them in all the fields of your qf ... 
evne fields where you think why in gods name would i need stopwords on 
this field, those terms will never exist in this field! ... you may know 
that, and it may be true, but it doesn't change the fact that people will 
be *querying* for stop words against those fields, and you want to ignore 
them when they do.



-Hoss


Re: Indexing relations for sorting

2011-05-02 Thread Chris Hostetter

: Every product-to-category relation has its own sorting order which we would
: like to index in solr. 
...
: We want all products of subcat1 (no mather what the parent category is)
: ordered by their sorting order
:  
: We want all products of cat2_subcat1 ordered by their sorting order

the best suggestion i can think of is to create a field per category and 
use it to index the sort order for that category -- you haven't said what 
type of cardinality you are dealing with in your categorization, so if 
it's relatively small this should work well ... if it's unbounded it will 
have some serious problems however.

: Our solr version is 1.3.0

Please, Please, Please consider upgrading when you are working on this 
project of yours.  there have been too many bug fixes and performance 
enhancements since then to even remotely dream of listing them all in this 
email (that's what the CHANGES.txt file is for)


-Hoss


How to debug if termsComponent is used

2011-05-02 Thread cyang2010
Hi, I defined a searchHanlder just for the sake of autosuggest, using
TermsComponent.

  searchComponent name=terms
class=org.apache.solr.handler.component.TermsComponent 
  /searchComponent


  requestHandler name=/terms
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
/lst

arr name=components
  strterms/str
  strdebug/str
/arr
  

This configuration might not even make sense, to configure terms and
debug component together.  Is debug component must be wired up with
query component?   I just need a requestHanlder where i can run
termsComponent, and debug on it.  How do I achieve that?

Thanks,

cy
  /requestHandler

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-debug-if-termsComponent-is-used-tp2891735p2891735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: updates not reflected in solr admin

2011-05-02 Thread Chris Hostetter

:  Thanks - we are issuing a commit via SolrJ; I think that's the same
:  thing, right?  Or are you saying really we need to do a separate commit
:  (via HTTP) to update the admin console's view?
...
:  Yes separate commit is needed.

Huh?

No ... that's not true at all.

A commit using SolrJ is no differnet then a commit via HTTP ... especially 
since that's all SOlrJ is doing when you ask it to commit.


-Hoss


Re: updates not reflected in solr admin

2011-05-02 Thread Chris Hostetter

: I saw a comment recently (from Lance) that there is (annoying) HTTP caching
: enabled by default in solrconfig.xml.  Does this sound like something that
: would be caused by that cache?  If so, I'd probably want to disable it.   Does

the HTTP caching that tends to bite people in the ass is actually your 
*browser* caching the responses from solr based on the headers solr sets 
in the response

http://wiki.apache.org/solr/SolrConfigXml#HTTP_Caching
 
In most browsers a Shift-Reload tells it to ignore it's cache a force a 
new request.

: that affect performance of queries run via SolrJ?  Also: why isn't that cache
: flushed by a commit?  Seems weird...

if you use the example configs that came with Solr 1.4.1, then solr would 
generate Last-Modified and ETag headers that *would* tell your browser 
that the results had chaged after commit.

If you use the example configs that came with SOlr 3.1, then solr sets the 
headers in such a way that the browser shouldn't cache at all.



-Hoss


Re: XS DateTime format

2011-05-02 Thread Chris Hostetter

: negative-signed numeral that represents the year. Is it intentional that 
: Solr strips leading zeros for the first four digits?

No, it's a really stupid bug, due to some really stupid date formatting i 
haven't had a chance to refactor out of existence...

https://issues.apache.org/jira/browse/SOLR-1899




-Hoss


Re: updates not reflected in solr admin

2011-05-02 Thread Jonathan Rochkind

On 5/2/2011 8:02 PM, Chris Hostetter wrote:

Huh?

No ... that's not true at all.

A commit using SolrJ is no differnet then a commit via HTTP ... especially
since that's all SOlrJ is doing when you ask it to commit.


Unless you're using the 'embedded' solr server?   Wonder if the OP is.

Jonathan


Re: Has NRT been abandoned?

2011-05-02 Thread Nagendra Nagarajayya

Thanks Andy!

Everything should work as before. So faceting, function queries, query 
boosting should still work.


For eg:
q=name:efghij^2.2 name:abcd^3.2

returns all docs with name efghij and abcd but ranking documents named 
abcd above efghij


Regards,
- NN

On 5/1/2011 7:15 PM, Andy wrote:

Nagendra,

This looks interesting. Does Solr-RA support:

1) facet
2) Boost query such as {!boost b=log(popularity)}foo

Thanks
Andy

--- On Sun, 5/1/11, Nagendra Nagarajayyannagaraja...@transaxtions.com  wrote:


From: Nagendra Nagarajayyannagaraja...@transaxtions.com
Subject: Re: Has NRT been abandoned?
To: solr-user@lucene.apache.org
Date: Sunday, May 1, 2011, 12:01 PM
Hi Andy:

I have a solution for NRT with Solr 1.4.1. The solution
uses the RankingAlgorithm as the search library. The NRT
functionality allows you to add documents without the
IndexSearchers being closed or caches being cleared. A
commit is not needed with the document update. Searches can
run concurrently with document updates. No changes are
needed except for enabling the NRT through solrconfig.xml.
The performance is about  262 TPS (document adds) on a
dual core intel system with 2GB heap with searches in
parallel. The performance at the moment is limited by how
fast IndexWriter.getReader() performs.

I have a white paper that describes NRT in details, allows
you to download the tweets, schema and solrconfig.xml files.
You can access the white paper from here:

http://solr-ra.tgels.com/papers/solr-ra_real_time_search.pdf

You can download Solr with RankingAlgorithm (Solr-RA) from
here:

http://solr-ra.tgels.com

I still have not yet integrated the NRT with Solr 3.1 (the
new release) and plan to do so very soon.

Please let me know if you need any more info.

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.com

On 5/1/2011 8:28 AM, Andy wrote:

Hi,

I read on this mailing list previously that NRT was

implemented in 4.0, it just  wasn't ready for
production yet. Then I looked at the wiki 
(http://wiki.apache.org/solr/NearRealtimeSearch). It
listed 2 jira issues related to NRT: SOLR-1308 and
SOLR-1278. Both issues have their resolutions set to Won't
Fix recently.

Does that mean NRT is no longer going to happen?

What's the state of NRT in Solr?

Thanks

Andy











Re: updates not reflected in solr admin

2011-05-02 Thread Michael Sokolov
No - this is all running against an external tomcat-based solr.  I'm 
back to being mystified now. Maybe  I'll see if I can isolate this a bit 
more.  I'll post back if I do, although I'm beginning to wonder if we 
should just move to 3.1 and not worry about it.


-Mike

On 5/2/2011 8:39 PM, Jonathan Rochkind wrote:

On 5/2/2011 8:02 PM, Chris Hostetter wrote:

Huh?

No ... that's not true at all.

A commit using SolrJ is no differnet then a commit via HTTP ... 
especially

since that's all SOlrJ is doing when you ask it to commit.


Unless you're using the 'embedded' solr server?   Wonder if the OP is.

Jonathan




Re: How to debug if termsComponent is used

2011-05-02 Thread Otis Gospodnetic
Hi,

That looks about right, but I don't know without checking around if debug 
component really needs query component, or if it can work with just terms 
component.
Have you tried it?  Did it not work?

You may save yourself a lot of work and get something better than terms 
component with http://sematext.com/products/autocomplete/index.html btw.  Or if 
you are using Solr trunk, with Suggester.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: cyang2010 ysxsu...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, May 2, 2011 6:57:49 PM
 Subject: How to debug if termsComponent is used
 
 Hi, I defined a searchHanlder just for the sake of autosuggest,  using
 TermsComponent.
 
   searchComponent  name=terms
 class=org.apache.solr.handler.component.TermsComponent  
   /searchComponent
 
 
   requestHandler  name=/terms
 class=org.apache.solr.handler.component.SearchHandler
  lst name=defaults
   str  name=echoParamsexplicit/str
  /lst
 
 arr name=components
strterms/str
strdebug/str
 /arr
   
 
 This configuration might not even make sense, to configure terms  and
 debug component together.  Is debug component must be wired up  with
 query component?   I just need a requestHanlder where i can  run
 termsComponent, and debug on it.  How do I achieve  that?
 
 Thanks,
 
 cy
/requestHandler
 
 --
 View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-debug-if-termsComponent-is-used-tp2891735p2891735.html

 Sent  from the Solr - User mailing list archive at Nabble.com.
 


Re: updates not reflected in solr admin

2011-05-02 Thread Michael Sokolov
Right I read those comments in the config, and it all sounds reasonable 
- presumably a new Searcher is opened when (or shortly after) we commit, 
from whatever source.  That was my operating assumption, and the reason 
I was so confused when I saw different result in two different clients.  
I don't want to pursue this probable user error beyond eliminating the 
obvious for the moment.  I'll post back if I get more info.  Thanks 
again everyone.


-Mike

On 5/2/2011 8:09 PM, Chris Hostetter wrote:

: I saw a comment recently (from Lance) that there is (annoying) HTTP caching
: enabled by default in solrconfig.xml.  Does this sound like something that
: would be caused by that cache?  If so, I'd probably want to disable it.   Does

the HTTP caching that tends to bite people in the ass is actually your
*browser* caching the responses from solr based on the headers solr sets
in the response

http://wiki.apache.org/solr/SolrConfigXml#HTTP_Caching

In most browsers a Shift-Reload tells it to ignore it's cache a force a
new request.

: that affect performance of queries run via SolrJ?  Also: why isn't that cache
: flushed by a commit?  Seems weird...

if you use the example configs that came with Solr 1.4.1, then solr would
generate Last-Modified and ETag headers that *would* tell your browser
that the results had chaged after commit.

If you use the example configs that came with SOlr 3.1, then solr sets the
headers in such a way that the browser shouldn't cache at all.



-Hoss




How to take differential backup of Solr Index

2011-05-02 Thread Gaurav Shingala

Hi,

Is there any way to take differential backup of Solr Index?

Thanks,
Gaurav

  

Re: How to take differential backup of Solr Index

2011-05-02 Thread Lance Norskog
The Replication feature does this. If you configure a query server as
a 'backup' server, it downloads changes but does not read them.

On Mon, May 2, 2011 at 9:56 PM, Gaurav Shingala
gaurav.shing...@hotmail.com wrote:

 Hi,

 Is there any way to take differential backup of Solr Index?

 Thanks,
 Gaurav





-- 
Lance Norskog
goks...@gmail.com


Re: Has NRT been abandoned?

2011-05-02 Thread Andy

 Everything should work as before. So faceting, function
 queries, query 
 boosting should still work.
 
 For eg:
 q=name:efghij^2.2 name:abcd^3.2
 
 returns all docs with name efghij and abcd but ranking
 documents named 
 abcd above efghij
 

Thanks Nagendra.

But I wasn't talking about field boost. The kind of boosting I need:

{!boost b=log(popularity)}foo

requires BoostQParserPlugin 
(http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html
 )

Does Solr-RA come with BoostQParserPlugin?

Thanks.