Re: Dynamically loading xml files from webapplication to index

2011-04-28 Thread Grijesh
You need to write some script using solrj or some other connector to parse
your data file and post to solr for indexing

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamically-loading-xml-files-from-webapplication-to-index-tp2865890p2873608.html
Sent from the Solr - User mailing list archive at Nabble.com.


XS DateTime format

2011-04-28 Thread Jens Flaaris
Hi,

I just have a small question regarding the output format of fields of type 
TrieDateField. If a document containing the date 0001-01-01T01.01.01Z is passed 
to Solr and I then try to search for that document the output of the date field 
is of format Y-MM-DDThh:mm:ssZ. The first three zeros are missing. According to 
XML specification found on w3.org  in XS DateTime is a four-or-more digit 
optionally negative-signed numeral that represents the year. Is it intentional 
that Solr strips leading zeros for the first four digits?

Thanks
Jens Jørgen Flaaris


fq parameter with partial value

2011-04-28 Thread elisabeth benoit
Hello,

I would like to know if there is a way to use the fq parameter with a
partial value.

For instance, if I have a request with fq=NAME:Joe, and I would like to
retrieve all answers where NAME contains Joe, including those with NAME =
Joe Smith.

Thanks,
Elisabeth


Re: fq parameter with partial value

2011-04-28 Thread Stefan Matheis
Hi Elisabeth,

that's not what FilterQueries are made for :) What against using that
Criteria in the Query?
Perhaps you want to describe your UseCase and we'll see if there's
another way to solve it?

Regards
Stefan

On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
elisaelisael...@gmail.com wrote:
 Hello,

 I would like to know if there is a way to use the fq parameter with a
 partial value.

 For instance, if I have a request with fq=NAME:Joe, and I would like to
 retrieve all answers where NAME contains Joe, including those with NAME =
 Joe Smith.

 Thanks,
 Elisabeth



Spatial Search

2011-04-28 Thread Jonas Lanzendörfer
Dear list :)

I am new to solr and try to use the spatial search feature which was added in 
3.1. In my schema.xml I have 2 double fields for latitude and longitude. How 
can I get them into the location field type? I use solrj to fill the index with 
data. If I would use a location field instead of two double fields, how could I 
fill this with solrj? I use annotations to link the data from my dto´s to the 
index fields...

Hope you got my problem...

best regards, Jonas

Re: fq parameter with partial value

2011-04-28 Thread elisabeth benoit
Hi Stefan,

Thanks for answering.

In more details, my problem is the following. I'm working on searching
points of interest (POIs), which can be hotels, restaurants, plumbers,
psychologists, etc.

Those POIs can be identified among other things  by categories or by brand.
And a single POIs might have different categories (no maximum number). User
might enter a query like


McDonald’s Paris


or


Restaurant Paris


or


many other possible queries


First I want to do a facet search on brand and categories, to find out which
case is the current case.


http://localhost:8080/solr /select?q=restaurant  paris
facet=truefacet.field=BRAND facet.field=CATEGORY

and get an answer like

lst name=facet_fields

lst name=CATEGORY

int name=Restaurant598/int

int name=Restaurant Hotel451/int



Then I want to send a request with fq= CATEGORY: Restaurant and still get
answers with CATEGORY= Restaurant Hotel.



One solution would be to modify the data to add a new document every time we
have a new category, so a POI with three different categories would be index
three times, each time with a different category.


But I was wondering if there was another way around.



Thanks again,

Elisabeth


2011/4/28 Stefan Matheis matheis.ste...@googlemail.com

 Hi Elisabeth,

 that's not what FilterQueries are made for :) What against using that
 Criteria in the Query?
 Perhaps you want to describe your UseCase and we'll see if there's
 another way to solve it?

 Regards
 Stefan

 On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Hello,
 
  I would like to know if there is a way to use the fq parameter with a
  partial value.
 
  For instance, if I have a request with fq=NAME:Joe, and I would like to
  retrieve all answers where NAME contains Joe, including those with NAME =
  Joe Smith.
 
  Thanks,
  Elisabeth
 



how to update database record after indexing

2011-04-28 Thread vrpar...@gmail.com
Hello,

i am using dataimporthandler to import data from sql server database.

my requirement is when solr completed indexing on particular database record 
i want to update that record in database

or after indexing all records if i can get all ids and update all records

how to achieve same ?

Thanks

Vishal Parekh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2874171.html
Sent from the Solr - User mailing list archive at Nabble.com.


manual background re-indexing

2011-04-28 Thread Paul Libbrecht

Hello list,

I am planning to implement a setup, to be run on unix scripts, that should 
perform a full pull-and-reindex in a background server and index then deploy 
that index. All should happen on the same machine.

I thought the replication methods would help me but they seem to rather solve 
the issues of distribution while, what I need, is only the ability to:

- suspend the queries
- swap the directories with the new index
- close all searchers
- reload and warm-up the searcher on the new index

Is there a part of the replication utilities (http or unix) that I could use to 
perform the above tasks?
I intend to do this on occasion... maybe once a month or even less.
Is reload the right term to be used?

paul

Re: Formatted date/time in long field and javabinRW exception

2011-04-28 Thread Markus Jelsma
Any thoughts on this one? Why does Solr output a string in a long field with 
XMLResponseWriter but fails doing so (as it should) with the javabin format?

On Tuesday 19 April 2011 10:52:33 Markus Jelsma wrote:
 Hi,
 
 Nutch 1.3-dev seems to have changed its tstamp field from a long to a
 properly formatted Solr readable date/time but the example Solr schema for
 Nutch still configures the tstamp field as a long. This results in a
 formatted date/time in a long field, which i think should not be allowed
 in the first place by Solr.
 
 long name=tstamp2011-04-19T08:16:31.675Z/long
 
 While the above is strange enough, i only found out it's all wrong when
 using the javabin format. The following query will throw an exception
 while using XML response writer works find and returns the tstamp as long
 but formatted as a proper date/time.
 
 javabin:
 
 curl
 http://localhost:8983/solr/select?fl=id,boost,tstamp,digeststart=0q=id:
 \[*+TO+*\]wt=javabinrows=2version=1
 
 Apr 19, 2011 10:34:50 AM
 org.apache.solr.request.BinaryResponseWriter$Resolver getDoc
 WARNING: Error reading a field from document :
 SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
 java.lang.NumberFormatException: For input string:
 2011-04-19T08:16:31.675Z at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:4
 8) at java.lang.Long.parseLong(Long.java:419)
 at java.lang.Long.valueOf(Long.java:525)
 at org.apache.solr.schema.LongField.toObject(LongField.java:82)
 at org.apache.solr.schema.LongField.toObject(LongField.java:33)
 at
 org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryResponse
 Writer.java:148) at
 org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(BinaryRe
 sponseWriter.java:124) at
 org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryRespons
 eWriter.java:88) at
 org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:143)
 at
 org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:1
 33) at
 org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:2
 21) at
 org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:138)
 at
 org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:87)
 at
 org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWriter.jav
 a:48) at
 org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter
 .java:322) at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java
 :254) more trace from Jetty
 
 Here's the wt=xml working fine and showing output for the tstamp field:
 
 markus@midas:~$ curl
 http://localhost:8983/solr/select?fl=id,boost,tstamp,digeststart=0q=id:
 \[*+TO+*\]wt=xmlrows=2version=1
 
 ?xml version=1.0 encoding=UTF-8?
 response
 responseHeaderstatus0/statusQTime17/QTime
 lst name=params
 str name=flid,boost,tstamp,digest/str
 str name=start0/str
 str name=qid:[* TO *]/str
 str name=wtxml/str
 str name=rows2/str
 str name=version1/str
 /lst/responseHeader
 result name=response numFound=2 start=0
 doc
 str name=digest478e77f99f7005ae71aa92a879be2fd4/str
 str name=ididfield/str
 long name=tstamp2011-04-19T08:16:31.689Z/long
 /doc
 doc
 str name=digest7ff92a31c58e43a34fd45bc6d87cda03/str
 str name=ididfield/str
 long name=tstamp2011-04-19T08:16:31.675Z/long
 /doc
 /result
 
 
 Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: manual background re-indexing

2011-04-28 Thread Shaun Campbell
Hi Paul

Would a multi-core set up and the swap command do what you want it to do?

http://wiki.apache.org/solr/CoreAdmin

Shaun

On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote:


 Hello list,

 I am planning to implement a setup, to be run on unix scripts, that should
 perform a full pull-and-reindex in a background server and index then deploy
 that index. All should happen on the same machine.

 I thought the replication methods would help me but they seem to rather
 solve the issues of distribution while, what I need, is only the ability to:

 - suspend the queries
 - swap the directories with the new index
 - close all searchers
 - reload and warm-up the searcher on the new index

 Is there a part of the replication utilities (http or unix) that I could
 use to perform the above tasks?
 I intend to do this on occasion... maybe once a month or even less.
 Is reload the right term to be used?

 paul


Re: fq parameter with partial value

2011-04-28 Thread Erick Erickson
So, I assume your CATEGORY field is multiValued but each value is not
broken up into tokens, right? If that's the case, would it work to have a
second field CATEGORY_TOKENIZED and run your fq against that
field instead?

You could have this be a multiValued field with an increment gap if you wanted
to prevent matches across separate entries and have your fq do a proximity
search where the proximity was less than the increment gap

Best
Erick

On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
elisaelisael...@gmail.com wrote:
 Hi Stefan,

 Thanks for answering.

 In more details, my problem is the following. I'm working on searching
 points of interest (POIs), which can be hotels, restaurants, plumbers,
 psychologists, etc.

 Those POIs can be identified among other things  by categories or by brand.
 And a single POIs might have different categories (no maximum number). User
 might enter a query like


 McDonald’s Paris


 or


 Restaurant Paris


 or


 many other possible queries


 First I want to do a facet search on brand and categories, to find out which
 case is the current case.


 http://localhost:8080/solr /select?q=restaurant  paris
 facet=truefacet.field=BRAND facet.field=CATEGORY

 and get an answer like

 lst name=facet_fields

 lst name=CATEGORY

 int name=Restaurant598/int

 int name=Restaurant Hotel451/int



 Then I want to send a request with fq= CATEGORY: Restaurant and still get
 answers with CATEGORY= Restaurant Hotel.



 One solution would be to modify the data to add a new document every time we
 have a new category, so a POI with three different categories would be index
 three times, each time with a different category.


 But I was wondering if there was another way around.



 Thanks again,

 Elisabeth


 2011/4/28 Stefan Matheis matheis.ste...@googlemail.com

 Hi Elisabeth,

 that's not what FilterQueries are made for :) What against using that
 Criteria in the Query?
 Perhaps you want to describe your UseCase and we'll see if there's
 another way to solve it?

 Regards
 Stefan

 On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Hello,
 
  I would like to know if there is a way to use the fq parameter with a
  partial value.
 
  For instance, if I have a request with fq=NAME:Joe, and I would like to
  retrieve all answers where NAME contains Joe, including those with NAME =
  Joe Smith.
 
  Thanks,
  Elisabeth
 




Re: how to update database record after indexing

2011-04-28 Thread Erick Erickson
I don't think you can do this through DIH, you'll probably have to write a
separate process that queries the Solr index and updates your table.

You'll have to be a bit cautious that you coordinate the commits, that
is wait for the DIH to complete and commit before running your separate
db update process.

Best
Erick

On Thu, Apr 28, 2011 at 6:59 AM, vrpar...@gmail.com vrpar...@gmail.com wrote:
 Hello,

 i am using dataimporthandler to import data from sql server database.

 my requirement is when solr completed indexing on particular database record
 i want to update that record in database

 or after indexing all records if i can get all ids and update all records

 how to achieve same ?

 Thanks

 Vishal Parekh

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p2874171.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Spatial Search

2011-04-28 Thread Yonik Seeley
On Thu, Apr 28, 2011 at 5:15 AM, Jonas Lanzendörfer
jonas.lanzendoer...@affinitas.de wrote:
 I am new to solr and try to use the spatial search feature which was added in 
 3.1. In my schema.xml I have 2 double fields for latitude and longitude. How 
 can I get them into the location field type? I use solrj to fill the index 
 with data. If I would use a location field instead of two double fields, how 
 could I fill this with solrj? I use annotations to link the data from my 
 dto´s to the index fields...


I've not used the annotation stuff in SolrJ, but since the value sent
in must be of the for 10.3,20.4 then
I guess one would have to have a String field with this value on your object.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: manual background re-indexing

2011-04-28 Thread Paul Libbrecht
Just where to do I put the new index data with such a command? Simply replacing 
the segment files appears dangerous to me.

Also, what is the best practice to move from single-core to multi-core?
My current set-up is single-core, do I simply need to add a solr.xml in my 
solr-home and one core1 directory with the data that was there previously?

paul


Le 28 avr. 2011 à 14:04, Shaun Campbell a écrit :

 Hi Paul
 
 Would a multi-core set up and the swap command do what you want it to do?
 
 http://wiki.apache.org/solr/CoreAdmin
 
 Shaun
 
 On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote:
 
 
 Hello list,
 
 I am planning to implement a setup, to be run on unix scripts, that should
 perform a full pull-and-reindex in a background server and index then deploy
 that index. All should happen on the same machine.
 
 I thought the replication methods would help me but they seem to rather
 solve the issues of distribution while, what I need, is only the ability to:
 
 - suspend the queries
 - swap the directories with the new index
 - close all searchers
 - reload and warm-up the searcher on the new index
 
 Is there a part of the replication utilities (http or unix) that I could
 use to perform the above tasks?
 I intend to do this on occasion... maybe once a month or even less.
 Is reload the right term to be used?
 
 paul



Re: manual background re-indexing

2011-04-28 Thread Erick Erickson
It would probable be safest just to set up a separate system as
multi-core from the start, get the process working and then either use
the new machine or copy the whole setup to the production machine.

Best
Erick

On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht p...@hoplahup.net wrote:
 Just where to do I put the new index data with such a command? Simply 
 replacing the segment files appears dangerous to me.

 Also, what is the best practice to move from single-core to multi-core?
 My current set-up is single-core, do I simply need to add a solr.xml in my 
 solr-home and one core1 directory with the data that was there previously?

 paul


 Le 28 avr. 2011 à 14:04, Shaun Campbell a écrit :

 Hi Paul

 Would a multi-core set up and the swap command do what you want it to do?

 http://wiki.apache.org/solr/CoreAdmin

 Shaun

 On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote:


 Hello list,

 I am planning to implement a setup, to be run on unix scripts, that should
 perform a full pull-and-reindex in a background server and index then deploy
 that index. All should happen on the same machine.

 I thought the replication methods would help me but they seem to rather
 solve the issues of distribution while, what I need, is only the ability to:

 - suspend the queries
 - swap the directories with the new index
 - close all searchers
 - reload and warm-up the searcher on the new index

 Is there a part of the replication utilities (http or unix) that I could
 use to perform the above tasks?
 I intend to do this on occasion... maybe once a month or even less.
 Is reload the right term to be used?

 paul




Re: fq parameter with partial value

2011-04-28 Thread elisabeth benoit
yes, the multivalued field is not broken up into tokens.

so, if I understand well what you mean, I could have

a field CATEGORY with  multiValued=true
a field CATEGORY_TOKENIZED with  multiValued= true

and then some POI

field name=NAMEPOI_Name/field
...
field name=*CATEGORY*Restaurant Hotel/field
field name=CATEGORY_TOKENIZEDRestaurant/field
field name=CATEGORY_TOKENIZEDHotel/field

do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.

But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?

Best regards
Elisabeth


2011/4/28 Erick Erickson erickerick...@gmail.com

 So, I assume your CATEGORY field is multiValued but each value is not
 broken up into tokens, right? If that's the case, would it work to have a
 second field CATEGORY_TOKENIZED and run your fq against that
 field instead?

 You could have this be a multiValued field with an increment gap if you
 wanted
 to prevent matches across separate entries and have your fq do a proximity
 search where the proximity was less than the increment gap

 Best
 Erick

 On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Hi Stefan,
 
  Thanks for answering.
 
  In more details, my problem is the following. I'm working on searching
  points of interest (POIs), which can be hotels, restaurants, plumbers,
  psychologists, etc.
 
  Those POIs can be identified among other things  by categories or by
 brand.
  And a single POIs might have different categories (no maximum number).
 User
  might enter a query like
 
 
  McDonald’s Paris
 
 
  or
 
 
  Restaurant Paris
 
 
  or
 
 
  many other possible queries
 
 
  First I want to do a facet search on brand and categories, to find out
 which
  case is the current case.
 
 
  http://localhost:8080/solr /select?q=restaurant  paris
  facet=truefacet.field=BRAND facet.field=CATEGORY
 
  and get an answer like
 
  lst name=facet_fields
 
  lst name=CATEGORY
 
  int name=Restaurant598/int
 
  int name=Restaurant Hotel451/int
 
 
 
  Then I want to send a request with fq= CATEGORY: Restaurant and still get
  answers with CATEGORY= Restaurant Hotel.
 
 
 
  One solution would be to modify the data to add a new document every time
 we
  have a new category, so a POI with three different categories would be
 index
  three times, each time with a different category.
 
 
  But I was wondering if there was another way around.
 
 
 
  Thanks again,
 
  Elisabeth
 
 
  2011/4/28 Stefan Matheis matheis.ste...@googlemail.com
 
  Hi Elisabeth,
 
  that's not what FilterQueries are made for :) What against using that
  Criteria in the Query?
  Perhaps you want to describe your UseCase and we'll see if there's
  another way to solve it?
 
  Regards
  Stefan
 
  On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
   Hello,
  
   I would like to know if there is a way to use the fq parameter with a
   partial value.
  
   For instance, if I have a request with fq=NAME:Joe, and I would like
 to
   retrieve all answers where NAME contains Joe, including those with
 NAME =
   Joe Smith.
  
   Thanks,
   Elisabeth
  
 
 



RE: fq parameter with partial value

2011-04-28 Thread Jonathan Rochkind
Yep, what you describe is what I do in similar situations, it works fine. 

It is certainly possible to facet on a tokenized field... but your individual 
facet values will be the _tokens_, not the complete values. And they'll be the 
post-analyzed tokens at that.  Which is rarely what you want.  Thus the use of 
two fields, one tokenized and analyzed, one not tokenized and minimimally 
analzyed (for instance, not stemmed). 

From: elisabeth benoit [elisaelisael...@gmail.com]
Sent: Thursday, April 28, 2011 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: fq parameter with partial value

yes, the multivalued field is not broken up into tokens.

so, if I understand well what you mean, I could have

a field CATEGORY with  multiValued=true
a field CATEGORY_TOKENIZED with  multiValued= true

and then some POI

field name=NAMEPOI_Name/field
...
field name=*CATEGORY*Restaurant Hotel/field
field name=CATEGORY_TOKENIZEDRestaurant/field
field name=CATEGORY_TOKENIZEDHotel/field

do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.

But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?

Best regards
Elisabeth


2011/4/28 Erick Erickson erickerick...@gmail.com

 So, I assume your CATEGORY field is multiValued but each value is not
 broken up into tokens, right? If that's the case, would it work to have a
 second field CATEGORY_TOKENIZED and run your fq against that
 field instead?

 You could have this be a multiValued field with an increment gap if you
 wanted
 to prevent matches across separate entries and have your fq do a proximity
 search where the proximity was less than the increment gap

 Best
 Erick

 On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Hi Stefan,
 
  Thanks for answering.
 
  In more details, my problem is the following. I'm working on searching
  points of interest (POIs), which can be hotels, restaurants, plumbers,
  psychologists, etc.
 
  Those POIs can be identified among other things  by categories or by
 brand.
  And a single POIs might have different categories (no maximum number).
 User
  might enter a query like
 
 
  McDonald’s Paris
 
 
  or
 
 
  Restaurant Paris
 
 
  or
 
 
  many other possible queries
 
 
  First I want to do a facet search on brand and categories, to find out
 which
  case is the current case.
 
 
  http://localhost:8080/solr /select?q=restaurant  paris
  facet=truefacet.field=BRAND facet.field=CATEGORY
 
  and get an answer like
 
  lst name=facet_fields
 
  lst name=CATEGORY
 
  int name=Restaurant598/int
 
  int name=Restaurant Hotel451/int
 
 
 
  Then I want to send a request with fq= CATEGORY: Restaurant and still get
  answers with CATEGORY= Restaurant Hotel.
 
 
 
  One solution would be to modify the data to add a new document every time
 we
  have a new category, so a POI with three different categories would be
 index
  three times, each time with a different category.
 
 
  But I was wondering if there was another way around.
 
 
 
  Thanks again,
 
  Elisabeth
 
 
  2011/4/28 Stefan Matheis matheis.ste...@googlemail.com
 
  Hi Elisabeth,
 
  that's not what FilterQueries are made for :) What against using that
  Criteria in the Query?
  Perhaps you want to describe your UseCase and we'll see if there's
  another way to solve it?
 
  Regards
  Stefan
 
  On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
   Hello,
  
   I would like to know if there is a way to use the fq parameter with a
   partial value.
  
   For instance, if I have a request with fq=NAME:Joe, and I would like
 to
   retrieve all answers where NAME contains Joe, including those with
 NAME =
   Joe Smith.
  
   Thanks,
   Elisabeth
  
 
 



boost fields which have value

2011-04-28 Thread Zoltán Altfatter
Hi,

How can I achieve that documents which don't have field1 and field2 filled
in, are returned in the end of the search result.

I have tried with *bf* parameter, which seems to work but just with one
field.

Is there any function query which I can use in bf value to boost two fields?

Thank you.

Regards,
Zoltan


Boost newer documents only if date is different from timestamp

2011-04-28 Thread Dietrich
I am trying to boost newer documents in Solr queries. The ms function
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
seems to be the right way to go, but I need to add an additional
condition:
I am using the last-Modified-Date from crawled web pages as the date
to consider, and that does not always provide a meaningful date.
Therefore I would like the function to only boost documents where the
date (not time) found in the last-Modified-Date is different from the
timestamp, eliminating results that just return the current date as
the last-Modified-Date. Suggestions are appreciated!


Searching for escaped characters

2011-04-28 Thread Paul
I'm trying to create a test to make sure that character sequences like
egrave; are successfully converted to their equivalent utf
character (that is, in this case, è).

So, I'd like to search my solr index using the equivalent of the
following regular expression:

\w{1,6};

To find any escaped sequences that might have slipped through.

Is this possible? I have indexed these fields with text_lu, which
looks like this:

   fieldtype name=text_lu class=solr.TextField positionIncrementGap=100
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

Thanks,
Paul


Re: Concatenate multivalued DIH fields

2011-04-28 Thread jimtronic
I solved this problem using the flatten=true attribute.

Given this schema
people
 person
  names
   name
firstNameJoe/firstName
lastNameSmith/firstName
   /name
  /names
 /person
/people

field column=attr_names xpath=/people/person/names/name flatten=true
/

attr_names is a multiValued field in my schema.xml. The flatten attribute
tells solr to take all the text from the specified node and below.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Concatenate-multivalued-DIH-fields-tp2749988p2875435.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: boost fields which have value

2011-04-28 Thread Robert Petersen
I believe the sortMissingLast fieldtype attribute is what you want: fieldType 
... sortMissingLast=true ... /


http://wiki.apache.org/solr/SchemaXml


-Original Message-
From: Zoltán Altfatter [mailto:altfatt...@gmail.com] 
Sent: Thursday, April 28, 2011 6:11 AM
To: solr-user@lucene.apache.org
Subject: boost fields which have value

Hi,

How can I achieve that documents which don't have field1 and field2 filled
in, are returned in the end of the search result.

I have tried with *bf* parameter, which seems to work but just with one
field.

Is there any function query which I can use in bf value to boost two fields?

Thank you.

Regards,
Zoltan


Re: Searching for escaped characters

2011-04-28 Thread Mike Sokolov
StandardTokenizer will have stripped punctuation I think.  You might try 
searching for all the entity names though:


(agrave | egrave | omacron | etc... )

The names are pretty distinctive.  Although you might have problems with 
greek letters.


-Mike

On 04/28/2011 12:10 PM, Paul wrote:

I'm trying to create a test to make sure that character sequences like
egrave; are successfully converted to their equivalent utf
character (that is, in this case, è).

So, I'd like to search my solr index using the equivalent of the
following regular expression:

\w{1,6};

To find any escaped sequences that might have slipped through.

Is this possible? I have indexed these fields with text_lu, which
looks like this:

fieldtype name=text_lu class=solr.TextField positionIncrementGap=100
   analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype

Thanks,
Paul
   


Re: SolrQuery#setStart(Integer) ???

2011-04-28 Thread Leonardo Souza
Hi Erick,

Correct, i cut some zeros while reading the javadocs, thanks for the heads
up!


[ ]'s
Leonardo da S. Souza
 °v°   Linux user #375225
 /(_)\   http://counter.li.org/
 ^ ^



On Wed, Apr 27, 2011 at 8:13 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, the java native int fomat is 32 bits, so  unless you're returning
 over 2 billion documents, you should be OK. But you'll run into other
 issues
 long before you get to that range.

 Best
 Erick

 On Wed, Apr 27, 2011 at 5:25 PM, Leonardo Souza leonardo...@gmail.com
 wrote:
  Hi Guys,
 
  We have an index with more than 3 millions documents, we use the
 pagination
  feature through SolrQuery#setStart and SolrQuery#setRows
  methods. Some queries can return a huge amount of documents and i'm worry
  about the integer parameter of  the setStart method, this parameter
  should be a long don't you think? For now i'm considering to use the
  ModifiableSolrParams class. Any suggestion is welcome!
 
  thanks!
 
 
  [ ]'s
  Leonardo Souza
   °v°   Linux user #375225
   /(_)\   http://counter.li.org/
   ^ ^
 



Re: Replicaiton Fails with Unreachable error when master host is responding.

2011-04-28 Thread Jed Glazner


  
  
Anybody?

On 04/27/2011 01:51 PM, Jed Glazner wrote:

  Hello All,

I'm having a very strange problem that I just can't figure out. The
slave is not able to replicate from the master, even though the master
is reachable from the slave machine.  I can telnet to the port it's
running on, I can use text based browsers to navigate the master from
the slave. I just don't understand why it won't replicate.  The admin
screen gives me an Unreachable in the status, and in the log there is an
exception thrown.  Details below:

BACKGROUND:

OS: Arch Linux
Solr Version: svn revision 1096983 from
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/
No custom plugins, just whatever came with the version above.
Java Setup:

java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10) (ArchLinux-6.b22_1.10-1-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

We have 3 cores running, all 3 cores are not able to replicate.

The admin on the slave shows  the Master as
http://solr-master-01_dev.la.bo:8983/solr/music/replication - *Unreachable*
Replicaiton def on the slave

  529 requestHandler name="/replication" class="solr.ReplicationHandler" 
  530 lst name="${slave:slave}"
  531 str
name="masterUrl"http://solr-master-01_dev.la.bo:8983/solr/music/replication/str
  532 str name="pollInterval"00:15:00/str
  533 /lst
  534 /requestHandler

Replication def on the master:

  529 requestHandler name="/replication" class="solr.ReplicationHandler" 
  530 lst name="${master:master}"
  531 str name="replicateAfter"commit/str
  532 str name="replicateAfter"startup/str
  533 str name="confFiles"schema.xml,stopwords.txt/str
  534 /lst
  535 /requestHandler

Below is the log start to finish for replication attempts, note that it
says connection refused, however, I can telnet to 8983 from the slave to
the master, so I know it's up and reachable from the slave:

telnet solr-master-01_dev.la.bo 8983
Trying 172.12.65.58...
Connected to solr-master-01_dev.la.bo.
Escape character is '^]'.

I double checked the master to make sure that it didn't have replication
turned off, and it's not.  So I should be able to replicate but it
can't.  I just dont' know what else to check.  The log from the slave is
below.

Apr 27, 2011 7:39:45 PM org.apache.solr.request.SolrQueryResponse init
WARNING: org.apache.solr.request.SolrQueryResponse is deprecated. Please
use the corresponding class in org.apache.solr.response
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.solr.handler.ReplicationHandler
getReplicationDetails
WARNING: Exception while invoking 'details' method for replication on
master
java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
 at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
 at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
 at java.net.Socket.connect(Socket.java:546)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)
 at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125)
 at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
 at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
 at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
 at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 at

Re: Replicaiton Fails with Unreachable error when master host is responding.

2011-04-28 Thread Mike Sokolov

No clue. Try wireshark to gather more data?

On 04/28/2011 02:53 PM, Jed Glazner wrote:

Anybody?

On 04/27/2011 01:51 PM, Jed Glazner wrote:

Hello All,

I'm having a very strange problem that I just can't figure out. The
slave is not able to replicate from the master, even though the master
is reachable from the slave machine.  I can telnet to the port it's
running on, I can use text based browsers to navigate the master from
the slave. I just don't understand why it won't replicate.  The admin
screen gives me an Unreachable in the status, and in the log there is an
exception thrown.  Details below:

BACKGROUND:

OS: Arch Linux
Solr Version: svn revision 1096983 from
https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/
No custom plugins, just whatever came with the version above.
Java Setup:

java version 1.6.0_22
OpenJDK Runtime Environment (IcedTea6 1.10) (ArchLinux-6.b22_1.10-1-x86_64)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)

We have 3 cores running, all 3 cores are not able to replicate.

The admin on the slave shows  the Master as
http://solr-master-01_dev.la.bo:8983/solr/music/replication  - *Unreachable*
Replicaiton def on the slave

   529requestHandler name=/replication class=solr.ReplicationHandler
   530lst name=${slave:slave}
   531str
name=masterUrlhttp://solr-master-01_dev.la.bo:8983/solr/music/replication/str
   532str name=pollInterval00:15:00/str
   533/lst
   534/requestHandler

Replication def on the master:

   529requestHandler name=/replication class=solr.ReplicationHandler
   530lst name=${master:master}
   531str name=replicateAftercommit/str
   532str name=replicateAfterstartup/str
   533str name=confFilesschema.xml,stopwords.txt/str
   534/lst
   535/requestHandler

Below is the log start to finish for replication attempts, note that it
says connection refused, however, I can telnet to 8983 from the slave to
the master, so I know it's up and reachable from the slave:

telnet solr-master-01_dev.la.bo 8983
Trying 172.12.65.58...
Connected to solr-master-01_dev.la.bo.
Escape character is '^]'.

I double checked the master to make sure that it didn't have replication
turned off, and it's not.  So I should be able to replicate but it
can't.  I just dont' know what else to check.  The log from the slave is
below.

Apr 27, 2011 7:39:45 PM org.apache.solr.request.SolrQueryResponseinit
WARNING: org.apache.solr.request.SolrQueryResponse is deprecated. Please
use the corresponding class in org.apache.solr.response
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (java.net.ConnectException) caught when processing
request: Connection refused
Apr 27, 2011 7:39:45 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: Retrying request
Apr 27, 2011 7:39:45 PM org.apache.solr.handler.ReplicationHandler
getReplicationDetails
WARNING: Exception while invoking 'details' method for replication on
master
java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
  at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
  at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
  at java.net.Socket.connect(Socket.java:546)
  at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
  at
org.apache.commons.httpclient.protocol.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)
  at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:125)
  at
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
  at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
  at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
  at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
  at

Re: fq parameter with partial value

2011-04-28 Thread Erick Erickson
See below:


On Thu, Apr 28, 2011 at 9:03 AM, elisabeth benoit
elisaelisael...@gmail.com wrote:
 yes, the multivalued field is not broken up into tokens.

 so, if I understand well what you mean, I could have

 a field CATEGORY with  multiValued=true
 a field CATEGORY_TOKENIZED with  multiValued= true

 and then some POI

 field name=NAMEPOI_Name/field
 ...
 field name=*CATEGORY*Restaurant Hotel/field
 field name=CATEGORY_TOKENIZEDRestaurant/field
 field name=CATEGORY_TOKENIZEDHotel/field

[EOE] If the above is the document you're sending, then no. The
document would be indexed with
field name=*CATEGORY*Restaurant Hotel/field
field name=CATEGORY_TOKENIZEDRestaurant Hotel/field


Or even just:
field name=*CATEGORY*Restaurant Hotel/field

and set up a copyField to copy the value from CATEGORY to CATEGORY_TOKENIZED.

The multiValued part comes from:
And a single POIs might have different categories so your document could have
which would look like:
field name=CATEGORYRestaruant Hotel/field
field name=CATEGORYHealth Spa/field
field name=CATEGORYDance Hall/field

and your document would be counted for each of those entries while searches
against CATEGORY_TOKENIZED would match things like dance spa etc.

But do notice that if you did NOT want searching for restaurant hall
(no quotes),
to match then you could do proximity searches for less than your
increment gap. e.g.
(this time with the quotes) would be restaurant hall~50, which would then
NOT match if your increment gap were 100.

Best
Erick



 do faceting on CATEGORY and fq on CATEGORY_TOKENIZED.

 But then, wouldn't it be possible to do faceting on CATEGORY_TOKENIZED?

 Best regards
 Elisabeth


 2011/4/28 Erick Erickson erickerick...@gmail.com

 So, I assume your CATEGORY field is multiValued but each value is not
 broken up into tokens, right? If that's the case, would it work to have a
 second field CATEGORY_TOKENIZED and run your fq against that
 field instead?

 You could have this be a multiValued field with an increment gap if you
 wanted
 to prevent matches across separate entries and have your fq do a proximity
 search where the proximity was less than the increment gap

 Best
 Erick

 On Thu, Apr 28, 2011 at 6:03 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Hi Stefan,
 
  Thanks for answering.
 
  In more details, my problem is the following. I'm working on searching
  points of interest (POIs), which can be hotels, restaurants, plumbers,
  psychologists, etc.
 
  Those POIs can be identified among other things  by categories or by
 brand.
  And a single POIs might have different categories (no maximum number).
 User
  might enter a query like
 
 
  McDonald’s Paris
 
 
  or
 
 
  Restaurant Paris
 
 
  or
 
 
  many other possible queries
 
 
  First I want to do a facet search on brand and categories, to find out
 which
  case is the current case.
 
 
  http://localhost:8080/solr /select?q=restaurant  paris
  facet=truefacet.field=BRAND facet.field=CATEGORY
 
  and get an answer like
 
  lst name=facet_fields
 
  lst name=CATEGORY
 
  int name=Restaurant598/int
 
  int name=Restaurant Hotel451/int
 
 
 
  Then I want to send a request with fq= CATEGORY: Restaurant and still get
  answers with CATEGORY= Restaurant Hotel.
 
 
 
  One solution would be to modify the data to add a new document every time
 we
  have a new category, so a POI with three different categories would be
 index
  three times, each time with a different category.
 
 
  But I was wondering if there was another way around.
 
 
 
  Thanks again,
 
  Elisabeth
 
 
  2011/4/28 Stefan Matheis matheis.ste...@googlemail.com
 
  Hi Elisabeth,
 
  that's not what FilterQueries are made for :) What against using that
  Criteria in the Query?
  Perhaps you want to describe your UseCase and we'll see if there's
  another way to solve it?
 
  Regards
  Stefan
 
  On Thu, Apr 28, 2011 at 9:09 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
   Hello,
  
   I would like to know if there is a way to use the fq parameter with a
   partial value.
  
   For instance, if I have a request with fq=NAME:Joe, and I would like
 to
   retrieve all answers where NAME contains Joe, including those with
 NAME =
   Joe Smith.
  
   Thanks,
   Elisabeth
  
 
 




Re: Extra facet query from within a custom search component

2011-04-28 Thread Erick Erickson
Have you looked at: http://wiki.apache.org/solr/TermsComponent?

Best
Erick

On Thu, Apr 28, 2011 at 2:44 PM, Frederik Kraus
frederik.kr...@gmail.com wrote:
 Hi Guys,

 I'm currently working on a custom search component and need to fetch a list 
 of all possible values within a certain field.
 An internal facet (wildcard) query first came to mind, but I'm not quite sure 
 how to best create and then execute such a query ...

 What would be the best way to do this?

 Can anyone please point me in the right direction?

 Thanks,

 Fred.


Problem with autogeneratePhraseQueries=false

2011-04-28 Thread solr_beginner
Hi,
 
I'm new to solr. My solr instance version is:
 
Solr Specification Version: 3.1.0
Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26 
18:00:07
Lucene Specification Version: 3.1.0
Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
Current Time: Tue Apr 26 08:01:09 CEST 2011
Server Start Time:Tue Apr 26 07:59:05 CEST 2011
 
I have following definition for textgen type:
 
 fieldType name=textgen class=solr.TextField positionIncrementGap=100 
autoGeneratePhraseQueries=false
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt 
enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 
side=front preserveOriginal=1/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
 
 
I'm using this type for name field in my index. As you can see I'm 
using autoGeneratePhraseQueries=false but for query sony vaio 4gb I'm getting 
following query in debug:
 
lst name=debug 
  str name=rawquerystringsony vaio 4gb/str 
  str name=querystringsony vaio 4gb/str 
  str name=parsedquery+name:sony +name:vaio +MultiPhraseQuery(name:(4gb 4) 
gb)/str 
  str name=parsedquery_toString+name:sony +name:vaio +name:(4gb 4) 
gb/str
 
Do you have any idea how can I avoid this MultiPhraseQuery?
 
Best Regards,
solr_beginner

Re: Problem with autogeneratePhraseQueries

2011-04-28 Thread Marcin Kostuch
Thank you very much for answer.

You were right. There was no luceneMatchVersion in solrconfig.xml of our dev
core. We thought that values not present in core configuration are copied
from main solrconfig.xml. I will investigate if our administrators did
something wrong during upgrade to 3.1.

On Tue, Apr 26, 2011 at 1:35 PM, Robert Muir rcm...@gmail.com wrote:

 What do you have in solrconfig.xml for luceneMatchVersion?

 If you don't set this, then its going to default to Lucene 2.9
 emulation so that old solr 1.4 configs work the same way. I tried your
 example and it worked fine here, and I'm guessing this is probably
 whats happening.

 the default in the example/solrconfig.xml looks like this:

 !-- Controls what version of Lucene various components of Solr
 adhere to.  Generally, you want to use the latest version to
 get all bug fixes and improvements. It is highly recommended
 that you fully re-index after changing this setting as it can
 affect both how text is indexed and queried.
  --
 luceneMatchVersionLUCENE_31/luceneMatchVersion

 On Tue, Apr 26, 2011 at 6:51 AM, Solr Beginner solr_begin...@onet.pl
 wrote:
  Hi,
 
  I'm new to solr. My solr instance version is:
 
  Solr Specification Version: 3.1.0
  Solr Implementation Version: 3.1.0 1085815 - grantingersoll - 2011-03-26
  18:00:07
  Lucene Specification Version: 3.1.0
  Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
  Current Time: Tue Apr 26 08:01:09 CEST 2011
  Server Start Time:Tue Apr 26 07:59:05 CEST 2011
 
  I have following definition for textgen type:
 
   fieldType name=textgen class=solr.TextField
 positionIncrementGap=100
  autoGeneratePhraseQueries=false
   analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1 catenateNumbers=1
  preserveOriginal=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=15
  side=front preserveOriginal=1/
   /analyzer
   analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true/
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0 catenateNumbers=0
  catenateAll=0 preserveOriginal=1/
  filter class=solr.LowerCaseFilterFactory/
   /analyzer
  /fieldType
 
 
  I'm using this type for name field in my index. As you can see I'm
  using autoGeneratePhraseQueries=false but for query sony vaio 4gb I'm
  getting following query in debug:
 
  lst name=debug
   str name=rawquerystringsony vaio 4gb/str
   str name=querystringsony vaio 4gb/str
   str name=parsedquery+name:sony +name:vaio
 +MultiPhraseQuery(name:(4gb
  4) gb)/str
   str name=parsedquery_toString+name:sony +name:vaio +name:(4gb 4)
  gb/str
 
  Do you have any idea how can I avoid this MultiPhraseQuery?
 
  Best Regards,
  solr_beginner
 



Dynamically loading xml files from webapplication to index

2011-04-28 Thread sankar
In our webapp, we need to upload a xml  data file  from the UI(dialogue box)
for  indexing 
we are not able to find the solution in documentation. plz suggest what is
the way to implement it



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamically-loading-xml-files-from-webapplication-to-index-tp2865890p2865890.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fieldCache only on stats page

2011-04-28 Thread Marcin Kostuch
Solr version:

Solr Specification Version: 3.1.0
Solr Implementation Version: 3.1.0 1085815 - grantingersoll -
2011-03-26 18:00:07
Lucene Specification Version: 3.1.0
Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
Current Time: Wed Apr 27 14:28:34 CEST 2011
Server Start Time:Wed Apr 27 11:07:00 CEST 2011

According to cache I can see only following informations:

CACHE

name:fieldCache
class:   org.apache.solr.search.SolrFieldCacheMBean
version: 1.0
description: Provides introspection of the Lucene FieldCache, this
is **NOT** a cache that is managed by Solr.
sourceid:$Id: SolrFieldCacheMBean.java 984594 2010-08-11 21:42:04Z 
yonik $
source:  $URL:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/solr/src/java/org/apache/solr/search/SolrFieldCacheMBean.java
$
name:fieldValueCache
class:   org.apache.solr.search.FastLRUCache
version: 1.0
description: Concurrent LRU Cache(maxSize=1, initialSize=10,
minSize=9000, acceptableSize=9500, cleanupThread=false)
sourceid:$Id: FastLRUCache.java 1065312 2011-01-30 16:08:25Z rmuir $
source:  $URL:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/solr/src/java/org/apache/solr/search/FastLRUCache.java
$

Nothing about filterCache or documentCache ;/

Best Regards,
Solr Beginner

On Wed, Apr 27, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.com wrote:
 There's nothing special you need to do to be able to view the various
 stats from admin/stats.jsp. If another look doesn't show them, could you
 post a screenshot?

 And please include the version of Solr you're using, I checked with 1.4.1.

 Best
 Erick

 On Wed, Apr 27, 2011 at 1:44 AM, Solr Beginner solr_begin...@onet.pl wrote:
 Hi,

 I can see only fieldCache (nothing about filter, query or document
 cache) on stats page. What I'm doing wrong? We have two servers with
 replication. There are two cores(prod, dev) on each server. Maybe I
 have to add something to solrconfig.xml of cores?

 Best Regards,
 Solr Beginner




Re: Extra facet query from within a custom search component

2011-04-28 Thread Frederik Kraus
Haaa fantastic! 

Thanks a lot!

Fred.
On Donnerstag, 28. April 2011 at 22:21, Erick Erickson wrote: 
 Have you looked at: http://wiki.apache.org/solr/TermsComponent?
 
 Best
 Erick
 
 On Thu, Apr 28, 2011 at 2:44 PM, Frederik Kraus
 frederik.kr...@gmail.com wrote:
  Hi Guys,
  
  I'm currently working on a custom search component and need to fetch a list 
  of all possible values within a certain field.
  An internal facet (wildcard) query first came to mind, but I'm not quite 
  sure how to best create and then execute such a query ...
  
  What would be the best way to do this?
  
  Can anyone please point me in the right direction?
  
  Thanks,
  
  Fred.
 


Re: AlternateDistributedMLT.patch not working (SOLR-788)

2011-04-28 Thread Shawn Heisey

On 2/23/2011 11:53 AM, Otis Gospodnetic wrote:

Hi Isha,

The patch is out of date.  You need to look at the patch and rejection and
update your local copy of the code to match the logic from the patch, if it's
still applicable to the version of Solr source code you have.


We have a need for distributed More Like This.  We're gearing up for a 
deployment of 3.1, so a patch against 1.4.1 is not very useful for us.


I've spent the last couple of days trying to rework both the original 
and the alternate patches on SOLR-788 to work against 3.1.  I don't 
understand enough about the code to know how to fix it.  I knew I had to 
change the value of PURPOSE_GET_MLT_RESULTS  to 0x800 because of the 
conflict with PURPOSE_GET_TERMS, but the changes in 
MoreLikeThisComponent.java are beyond me.


Thanks,
Shawn



Re: Spatial Search

2011-04-28 Thread Jan Høydahl
1) Create an extra String field on your bean as Yonik suggests or
2) Write an UpdateRequestHandler which reads the doubles and creates the LatLon 
from that

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. apr. 2011, at 14.44, Yonik Seeley wrote:

 On Thu, Apr 28, 2011 at 5:15 AM, Jonas Lanzendörfer
 jonas.lanzendoer...@affinitas.de wrote:
 I am new to solr and try to use the spatial search feature which was added 
 in 3.1. In my schema.xml I have 2 double fields for latitude and longitude. 
 How can I get them into the location field type? I use solrj to fill the 
 index with data. If I would use a location field instead of two double 
 fields, how could I fill this with solrj? I use annotations to link the data 
 from my dto´s to the index fields...
 
 
 I've not used the annotation stuff in SolrJ, but since the value sent
 in must be of the for 10.3,20.4 then
 I guess one would have to have a String field with this value on your object.
 
 
 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco



Rép : Re: manual background re-indexing

2011-04-28 Thread Paul Libbrecht
 It would probable be safest just to set up a separate system as
 multi-core from the start, get the process working and then either use
 the new machine or copy the whole setup to the production machine. 
 On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht p...@hoplahup.net wrote:
 Just where to do I put the new index data with such a command? Simply replacing the segment files appears dangerous to me.Any idea where I should put the data directory before calling the reload command?paul

Re: Rép : Re: manual background re-indexing

2011-04-28 Thread Erick Erickson
You simply create two cores. One in solr/cores/core1 and another in
solr/cores/core2
They each have a separate conf and data directory,and the index in in
core#/data/index.

Really, its' just introducing one more level. You can experiment just
by configuring a core
and copying your index to solr/cores/yourcore/data/index. After, of
course, configuring
Solr.xml to understand cores.


Best
Erick

On Thu, Apr 28, 2011 at 7:27 PM, Paul Libbrecht p...@hoplahup.net wrote:
 It would probable be safest just to set up a separate system as
 multi-core from the start, get the process working and then either use
 the new machine or copy the whole setup to the production machine.

 On Thu, Apr 28, 2011 at 8:49 AM, Paul Libbrecht p...@hoplahup.net wrote:
 Just where to do I put the new index data with such a command? Simply
 replacing the segment files appears dangerous to me.


 Any idea where I should put the data directory before calling the reload
 command?
 paul


Location of Solr Logs

2011-04-28 Thread Geeta Subramanian
Hi,

I am newbee to SOLR.
Can you please help me to know where can see the logs written by SOLR?
Is there any configuration required to see the logs of SOLR?

Thanks for your time and help,
Geeta
**Legal Disclaimer***
This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you.
*


Can the Suggester be updated incrementally?

2011-04-28 Thread Andy
I'm interested in using Suggester (http://wiki.apache.org/solr/Suggester) for 
auto-complete on the field Document Title.

Does Suggester (either FST, TST or Jaspell) support incremental updates? Say I 
want to add a new document title to the Suggester, or to change the weight of 
an existing document title, would I need to rebuild the entire tree for every 
update?

Also, can the Suggester be sharded? If the size of the tree gets bigger than 
the RAM size, is it possible to shard the Suggester across multiple machines?

Thanks
Andy


Re: Can the Suggester be updated incrementally?

2011-04-28 Thread Jason Rutherglen
It's answered on the wiki site:

TSTLookup - ternary tree based representation, capable of immediate
data structure updates

Although the EdgeNGram technique is probably more widely adopted, eg,
it's closer to what Google has implemented.

http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

On Thu, Apr 28, 2011 at 9:37 PM, Andy angelf...@yahoo.com wrote:
 I'm interested in using Suggester (http://wiki.apache.org/solr/Suggester) for 
 auto-complete on the field Document Title.

 Does Suggester (either FST, TST or Jaspell) support incremental updates? Say 
 I want to add a new document title to the Suggester, or to change the weight 
 of an existing document title, would I need to rebuild the entire tree for 
 every update?

 Also, can the Suggester be sharded? If the size of the tree gets bigger than 
 the RAM size, is it possible to shard the Suggester across multiple machines?

 Thanks
 Andy



Re: Question on Batch process

2011-04-28 Thread Otis Gospodnetic
Charles,

Maybe the question to ask is why you are committing at all?  Do you need 
somebody to see index changes while you are indexing?  If not, commit just at 
the end.  And optimize if you won't touch the index for a while.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Charles Wardell charles.ward...@bcsolution.com
 To: solr-user@lucene.apache.org
 Sent: Wed, April 27, 2011 7:51:20 PM
 Subject: Re: Question on Batch process
 
 Thank you for your response. I did not make the StreamingUpdate application 
yet,  but I did change the other settings that you mentioned. It gave me a 
huge 
boost  in indexing speed. (I am still using post.sh but hope to change that  
soon).
 
 One thing I noticed is the indexing speed was incredibly fast last  night, 
 but 
today the commits are taking so long. Is this to be  expected?
 
 
 
 -- 
 Best Regards,
 
 Charles Wardell
 Blue  Chips Technology, Inc.
 www.bcsolution.com
 
 On Wednesday, April 27, 2011  at 6:15 PM, Otis Gospodnetic wrote: 
  Hi Charles,
  
  Yes,  the threads I was referring to are in the context of the 
client/indexer, so 

  one of the params for StreamingUpdateSolrServer.
  post.sh/jar  are just there because they are handy. Don't use them for 
   production.
  
  It's impossible to tell how long indexing of 100M  documents may take. They 
  could be very big or very small. You could  perform very light or no 
  analysis 
or 

  heavy analysis. They could contain  1 or 100 fields. :)
  
  Otis
  
  Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
  Lucene ecosystem search :: http://search-lucene.com/
  
  
  
  - Original  Message 
   From: Charles Wardell charles.ward...@bcsolution.com
To: solr-user@lucene.apache.org
Sent: Tue, April 26, 2011 8:01:28 PM
   Subject: Re: Question on  Batch process
   
   Thank you Otis.
   Without  trying to appear to stupid, when you refer to having the params 
matching your # of CPU cores, you are talking about the # of threads I 
   can 

   spawn with the StreamingUpdateSolrServer object?
   Up  until now, I have been just utilizing post.sh or post.jar. Are these 
capable of that or do I need to write some code to collect a bunch of 
files 

   into the buffer and send it off?
   
   Also,  Do you have a sense for how long it should take to index 100,000 
files 

or in my case 100,000,000 documents?
StreamingUpdateSolrServer
   public StreamingUpdateSolrServer(String  solrServerUrl, int queueSize, 
   int 

   threadCount) throws  MalformedURLException
   
   Thanks again,
Charlie
   
   -- 
   Best Regards,
   
   Charles Wardell
   Blue Chips Technology, Inc.
www.bcsolution.com
   
   On Tuesday, April 26, 2011 at  5:12 PM, Otis Gospodnetic wrote: 
Charlie,

How's this:
* -Xmx2g
*  ramBufferSizeMB 512
* mergeFactor 10 (default, but you could  up it to 20, 30, if ulimit -n 
   allows)
*  ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB
 * use SolrStreamingUpdateServer (with params matching your number of 
   CPU 

   cores) 
   
or send batches of say  1000 docs with the other SolrServer impl using 
N 

   threads 

(N=# of your CPU cores)

 Otis
 
Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


 
- Original Message 
  From: Charles Wardell charles.ward...@bcsolution.com
  To: solr-user@lucene.apache.org
  Sent: Tue, April 26, 2011 2:32:29 PM
  Subject: Question on Batch process
 
  I am sure that this question has been asked a few times, but I can't 
seem 

   to 
   
 find the sweetspot for  indexing.
 
 I have about 100,000  files each containing 1,000 xml documents ready 
to be 

   
  posted to Solr. My desire is to have it index as quickly as  
  possible 
and 

   then 
   
 once  completed the daily stream of ADDs will be small in comparison.
  
 The individual documents are small.  Essentially web postings from 
 the 
net. 

   
  Title, postPostContent, date. 
 
 
  What would be the ideal configuration? For  RamBufferSize, 
mergeFactor, 

 MaxbufferedDocs,  etc..
 
 My machine is a quad core  hyper-threaded. So it shows up as 8 cpu's 
 in 

  TOP
  I have 16GB of available ram.
 
 
 Thanks in advance.
  Charlie