Re: Hotel Searches

2013-01-09 Thread Upayavira
It seems to me like you want to use result grouping by hotel. You'll
have to add up the tariffs for each hotel, but that isn't hard.

Upayavira

On Wed, Jan 9, 2013, at 06:08 AM, Harshvardhan Ojha wrote:
 Hi Alex,
 
 Thanks for your reply.
 I saw prices based on daterange using multipoints . But this is not my
 problem. Instead the problem statement for me is pretty simple.
 
 Say I have 100 documents each having tariff as field.
 Doc1
 doc
 double name=tariff2400.0/double
 /doc
 
 Doc2
 doc
 double name=tariff2500.0/double
 /doc
 
 Now a user's search should give me a total tariff.
 
 Desired result
 doc
 double name=tariff4900.0/double
 /doc
 
 And this could be any combination for 100 docs it is (100+101)/2.
 (N*N+1)/2.
 
 How can I get these combination of documents already indexed ?
 Or is there any way to do calculations at runtime?
 
 How can I place this constraint that if there is any 1 doc missing in a
 range don’t give me any result.(if a user asked for hotel tariff from
 11th to 13th, and I don’t have tariff for 12th, I shouldn't add 11th and
 13th only).
 
 Hope I made my problem very simple.
 
 Regards
 Harshvardhan Ojha
 
 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
 Sent: Tuesday, January 08, 2013 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Hotel Searches
 
 Did you look at a conversation thread from 12 Dec 2012 on this list? Just
 go to the archives and search for 'hotel'. Hopefully that will give you
 something to work with.
 
 If you have any questions after that, come back with more specifics.
 
 Regards,
Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
 On Tue, Jan 8, 2013 at 7:18 AM, Harshvardhan Ojha 
 harshvardhan.o...@makemytrip.com wrote:
 
  Sorry for that, we just spoiled that thread so posted my question in a 
  fresh thread.
 
  Problem is indeed very simple.
  I have solr documents, which has all the required fields(from db).
  Say DOC1,DOC2,DOC3.DOCn.
 
  Every document has 1 night tariff and I have 180 nights tariff.
  So a person can search for any combination in these 180 nights.
 
  Say a request came to me to give total tariff for 10th to 15th of jan 2013.
  Now I need to get a sum of tariff field of 6 docs.
 
  So how can I keep this data indexed, to avoid search time calculation, 
  and there are other dimensions of this data also beside tariff.
  Hope this makes sense.
 
  Regards
  Harshvardhan Ojha
 
  -Original Message-
  From: Gora Mohanty [mailto:g...@mimirtech.com]
  Sent: Tuesday, January 08, 2013 5:37 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Hotel Searches
 
  On 8 January 2013 17:10, Harshvardhan Ojha  
  harshvardhan.o...@makemytrip.com wrote:
   Hi All,
  
   Looking into a finding solution for Hotel searches based on the 
   below criteria's
  [...]
 
  Didn't you just post this on a separate thread, complete with some 
  nonsensical follow-up from a colleague of yours? Please do not repost 
  the same message over and over again.
 
  It is not clear what you are trying to achieve.
  What is the difference between a city and a hotel in your data? How is 
  a person represented in your documents? Is it by the ID field?
 
  Are you looking to cache all possible combinations of ID, city, and 
  startdate? If so, to what end?  This smells like a XY problem:
  http://people.apache.org/~hossman/#xyproblem
 
  Regards,
  Gora
 


Re: Hotel Searches

2013-01-09 Thread Uwe Reh

Hi,

maybe I'm thinking too simple again. Nevertheless, here an idea to solve 
the question. The basic thought is to get rid of the range query.


Have:
- a textfield 'vacant_days'. Instead of ISO-Dates just simple dates in 
the form mmdd
- a dynamic field 'price_*', You can add the tariff for Jan. 31th into 
'price_0131'


To get the total,  eg. Feb. 1st to Feb. 3th you could query for the days 
0201, 0202 and 0203. You can calculate the sum of the corresponding 
price fields

q=vacant_days:0201 AND vacant_days:0202 AND 
vacant_days:0203fl?_val_:sum(price_0201, price_0202, price_0203)

(not tested)

Uwe


Am 09.01.2013 07:08, schrieb Harshvardhan Ojha:

Hi Alex,

Thanks for your reply.
I saw prices based on daterange using multipoints . But this is not my 
problem. Instead the problem statement for me is pretty simple.

Say I have 100 documents each having tariff as field.
Doc1
doc
double name=tariff2400.0/double
/doc

Doc2
doc
double name=tariff2500.0/double
/doc

Now a user's search should give me a total tariff.

Desired result
doc
double name=tariff4900.0/double
/doc

And this could be any combination for 100 docs it is (100+101)/2. (N*N+1)/2.

How can I get these combination of documents already indexed ?
Or is there any way to do calculations at runtime?

How can I place this constraint that if there is any 1 doc missing in a range 
don’t give me any result.(if a user asked for hotel tariff from 11th to 13th, 
and I don’t have tariff for 12th, I shouldn't add 11th and 13th only).

Hope I made my problem very simple.

Regards
Harshvardhan Ojha

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Tuesday, January 08, 2013 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Hotel Searches

Did you look at a conversation thread from 12 Dec 2012 on this list? Just go to 
the archives and search for 'hotel'. Hopefully that will give you something to 
work with.

If you have any questions after that, come back with more specifics.

Regards,
Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jan 8, 2013 at 7:18 AM, Harshvardhan Ojha  
harshvardhan.o...@makemytrip.com wrote:


Sorry for that, we just spoiled that thread so posted my question in a
fresh thread.

Problem is indeed very simple.
I have solr documents, which has all the required fields(from db).
Say DOC1,DOC2,DOC3.DOCn.

Every document has 1 night tariff and I have 180 nights tariff.
So a person can search for any combination in these 180 nights.

Say a request came to me to give total tariff for 10th to 15th of jan 2013.
Now I need to get a sum of tariff field of 6 docs.

So how can I keep this data indexed, to avoid search time calculation,
and there are other dimensions of this data also beside tariff.
Hope this makes sense.

Regards
Harshvardhan Ojha

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com]
Sent: Tuesday, January 08, 2013 5:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Hotel Searches

On 8 January 2013 17:10, Harshvardhan Ojha 
harshvardhan.o...@makemytrip.com wrote:

Hi All,

Looking into a finding solution for Hotel searches based on the
below criteria's

[...]

Didn't you just post this on a separate thread, complete with some
nonsensical follow-up from a colleague of yours? Please do not repost
the same message over and over again.

It is not clear what you are trying to achieve.
What is the difference between a city and a hotel in your data? How is
a person represented in your documents? Is it by the ID field?

Are you looking to cache all possible combinations of ID, city, and
startdate? If so, to what end?  This smells like a XY problem:
http://people.apache.org/~hossman/#xyproblem

Regards,
Gora





Solr + Munin, a good plugin?

2013-01-09 Thread Bruno Mannina

Dear Solr Users,

Does anyone have a plugin to scan the number of request (/select) by 
hour/day/week/Month/Year


I try to use the plugin solr_qps but it's not really good.

Thanks a lot,
Bruno


[OFFER] Consulting job with search specialists based in Cambridge UK

2013-01-09 Thread Charlie Hull

Hi all,

Hope you don't mind me cluttering up the list with a job offer. We're a 
team of search specialists based in the UK and we're hiring:

http://www.flax.co.uk/hiring/

We're ideally looking for someone with experience of Apache Lucene/Solr 
development, able to work on a flexible contract basis, probably mainly 
remotely. We work on search and related applications for a wide variety 
of clients in the UK and abroad including major newspapers, recruitment 
firms, governments and startups. If you're in the UK that's great but if 
not it's still worth you contacting us. Examples of past work on 
Lucene/Solr projects would be useful.


Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


performance improvements on ip look up query

2013-01-09 Thread Lee Carroll
Hi

We are doing a lat/lon look up query using ip address.

We have a 6.5 million document core of the following structure
start ip block
end ip block
location id
location_lat_lon

the field defs are
types
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
fieldType name=tlong class=solr.TrieLongField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=location class=solr.LatLonType subFieldSuffix=
_coordinate/
/types
fields
field name=startIp type=string indexed=true stored=false required=
true/
field name=startIpNum type=tlong indexed=true stored=false required
=true/
field name=endIpNum type=tlong indexed=true stored=false required=
true/
field name=locId type=string indexed=true stored=true required=
true/
field name=countryCode type=string indexed=true stored=true
required=false/
field name=cityName type=string indexed=false stored=true required
=false/
field name=latLon type=location indexed=true stored=true required=
true/
field name=latitude type=string indexed=false stored=true required
=true/
field name=longitude type=string indexed=false stored=true required
=true/
dynamicField name=*_coordinate type=tdouble indexed=true stored=
false/
/fields

the query at the moment is simply a range query

q=startIpNum:[* TO 180891652]AND endIpNum:[180891652 TO *]

we are seeing a full query cache with a low hit rate 0.2 and a high
eviction rate which makes sense given the use of ip address in the query.

query time mean is 120.

Is their a better way of structuring the core for this usecase ?
I suspect our heap memory settings are conservative 1g but will need to
convince our sys admins to change this (they are not ringing any resource
alarm bells) just the query is a little slow


Re: fieldtype for name

2013-01-09 Thread Michael Jones
Thanks. It isn't necessarily the need to match 'dick' to 'robert' but to
search for:
'name surname'
name, surname'
'surname name'
'surname, name'

And nothing else, I don't need to worry about nick names or abbreviations
of a name, just the above variations. I think I might use text_ws.


On Tue, Jan 8, 2013 at 9:39 PM, Uwe Reh r...@hebis.uni-frankfurt.de wrote:

 Hi Michael,

 in our index ob bibliographic metadata, we see the need for at least tree
 fields:
 - name_facet: String as type, because the facet should should represent
 the original inverted format from our data.
 - name: TextField for searching. This field is heavily analyzed to match
 different orders, to match synonyms, phonetic similarity, German umlauts
 and other European stuff.
 - name_lc: TextField. This field is just mapped to lower case. It's used
 to boost docs with the same style of writing like the users input.

 Uwe

 Am 08.01.2013 15:30, schrieb Michael Jones:

  Hi,

 What would be the best fieldtype for a persons name? at the moment I'm
 using text_general but, if I search for bob smith, some results I get back
 might be rob thomas. In that it's matched 'ob'.

 But I only really want results that are either

 'bob smith'
 'bob, smith'
 'smith, bob'
 'smith bob'

 Thanks





Re: fieldtype for name

2013-01-09 Thread Michael Jones
Also. I'm allowing users to do enter a name with quotes to search for an
exact name. So at the moment only smith, robert will return any results
where *robert smith* will return all variations including 'smith, herbert'


On Wed, Jan 9, 2013 at 11:09 AM, Michael Jones michaelj...@gmail.comwrote:

 Thanks. It isn't necessarily the need to match 'dick' to 'robert' but to
 search for:
 'name surname'
 name, surname'
 'surname name'
 'surname, name'

 And nothing else, I don't need to worry about nick names or abbreviations
 of a name, just the above variations. I think I might use text_ws.


 On Tue, Jan 8, 2013 at 9:39 PM, Uwe Reh r...@hebis.uni-frankfurt.dewrote:

 Hi Michael,

 in our index ob bibliographic metadata, we see the need for at least tree
 fields:
 - name_facet: String as type, because the facet should should represent
 the original inverted format from our data.
 - name: TextField for searching. This field is heavily analyzed to match
 different orders, to match synonyms, phonetic similarity, German umlauts
 and other European stuff.
 - name_lc: TextField. This field is just mapped to lower case. It's used
 to boost docs with the same style of writing like the users input.

 Uwe

 Am 08.01.2013 15:30, schrieb Michael Jones:

  Hi,

 What would be the best fieldtype for a persons name? at the moment I'm
 using text_general but, if I search for bob smith, some results I get
 back
 might be rob thomas. In that it's matched 'ob'.

 But I only really want results that are either

 'bob smith'
 'bob, smith'
 'smith, bob'
 'smith bob'

 Thanks






Highlighting: When alternateField does not exist

2013-01-09 Thread Jan Høydahl
Hi,

The alternateField and maxAlternateFieldLength params work well, but only as 
long as the alternate field actually exists for the document. If it does not, 
highlighting returns nothing.

We would like this behavior
1. Highlighting in body if matches
2. Fallback to verbatim teaser if it exists
3. If fallback field does not exist, look for a secondary fallback field

To support this behaviour in a back-compat way, how about allowing a 
comma-separated list of alternate fields to consider: 
hl.alternateField=field1,field2,field3.. where the first existing one is 
selected

Or do you have other workarounds for this problem on the solr side? In this 
case we cannot control the source DB to make sure the teaser exists.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com



RE: Hotel Searches

2013-01-09 Thread Harshvardhan Ojha
Hi Uwe,

Thanks for your reply. I think this will solve my problem.

Regards
Harshvardhan Ojha

-Original Message-
From: Uwe Reh [mailto:r...@hebis.uni-frankfurt.de] 
Sent: Wednesday, January 09, 2013 2:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Hotel Searches

Hi,

maybe I'm thinking too simple again. Nevertheless, here an idea to solve the 
question. The basic thought is to get rid of the range query.

Have:
- a textfield 'vacant_days'. Instead of ISO-Dates just simple dates in the form 
mmdd
- a dynamic field 'price_*', You can add the tariff for Jan. 31th into 
'price_0131'

To get the total,  eg. Feb. 1st to Feb. 3th you could query for the days 0201, 
0202 and 0203. You can calculate the sum of the corresponding price fields
 q=vacant_days:0201 AND vacant_days:0202 AND 
 vacant_days:0203fl?_val_:sum(price_0201, price_0202, price_0203)
(not tested)

Uwe


Am 09.01.2013 07:08, schrieb Harshvardhan Ojha:
 Hi Alex,

 Thanks for your reply.
 I saw prices based on daterange using multipoints . But this is not my 
 problem. Instead the problem statement for me is pretty simple.

 Say I have 100 documents each having tariff as field.
 Doc1
 doc
 double name=tariff2400.0/double
 /doc

 Doc2
 doc
 double name=tariff2500.0/double
 /doc

 Now a user's search should give me a total tariff.

 Desired result
 doc
 double name=tariff4900.0/double
 /doc

 And this could be any combination for 100 docs it is (100+101)/2. (N*N+1)/2.

 How can I get these combination of documents already indexed ?
 Or is there any way to do calculations at runtime?

 How can I place this constraint that if there is any 1 doc missing in a range 
 don’t give me any result.(if a user asked for hotel tariff from 11th to 13th, 
 and I don’t have tariff for 12th, I shouldn't add 11th and 13th only).

 Hope I made my problem very simple.

 Regards
 Harshvardhan Ojha

 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Tuesday, January 08, 2013 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Hotel Searches

 Did you look at a conversation thread from 12 Dec 2012 on this list? Just go 
 to the archives and search for 'hotel'. Hopefully that will give you 
 something to work with.

 If you have any questions after that, come back with more specifics.

 Regards,
 Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all 
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD 
 book)


 On Tue, Jan 8, 2013 at 7:18 AM, Harshvardhan Ojha  
 harshvardhan.o...@makemytrip.com wrote:

 Sorry for that, we just spoiled that thread so posted my question in 
 a fresh thread.

 Problem is indeed very simple.
 I have solr documents, which has all the required fields(from db).
 Say DOC1,DOC2,DOC3.DOCn.

 Every document has 1 night tariff and I have 180 nights tariff.
 So a person can search for any combination in these 180 nights.

 Say a request came to me to give total tariff for 10th to 15th of jan 2013.
 Now I need to get a sum of tariff field of 6 docs.

 So how can I keep this data indexed, to avoid search time 
 calculation, and there are other dimensions of this data also beside tariff.
 Hope this makes sense.

 Regards
 Harshvardhan Ojha

 -Original Message-
 From: Gora Mohanty [mailto:g...@mimirtech.com]
 Sent: Tuesday, January 08, 2013 5:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Hotel Searches

 On 8 January 2013 17:10, Harshvardhan Ojha  
 harshvardhan.o...@makemytrip.com wrote:
 Hi All,

 Looking into a finding solution for Hotel searches based on the 
 below criteria's
 [...]

 Didn't you just post this on a separate thread, complete with some 
 nonsensical follow-up from a colleague of yours? Please do not repost 
 the same message over and over again.

 It is not clear what you are trying to achieve.
 What is the difference between a city and a hotel in your data? How 
 is a person represented in your documents? Is it by the ID field?

 Are you looking to cache all possible combinations of ID, city, and 
 startdate? If so, to what end?  This smells like a XY problem:
 http://people.apache.org/~hossman/#xyproblem

 Regards,
 Gora




Re: fieldtype for name

2013-01-09 Thread Otis Gospodnetic
Hi,

Without seeing the configs I would guess default query operator might be OR
(and check docs for mm parameter on the Wiki) or there are ngrams involved.
Former is more likely.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 9, 2013 6:16 AM, Michael Jones michaelj...@gmail.com wrote:

 Also. I'm allowing users to do enter a name with quotes to search for an
 exact name. So at the moment only smith, robert will return any results
 where *robert smith* will return all variations including 'smith, herbert'


 On Wed, Jan 9, 2013 at 11:09 AM, Michael Jones michaelj...@gmail.com
 wrote:

  Thanks. It isn't necessarily the need to match 'dick' to 'robert' but to
  search for:
  'name surname'
  name, surname'
  'surname name'
  'surname, name'
 
  And nothing else, I don't need to worry about nick names or abbreviations
  of a name, just the above variations. I think I might use text_ws.
 
 
  On Tue, Jan 8, 2013 at 9:39 PM, Uwe Reh r...@hebis.uni-frankfurt.de
 wrote:
 
  Hi Michael,
 
  in our index ob bibliographic metadata, we see the need for at least
 tree
  fields:
  - name_facet: String as type, because the facet should should represent
  the original inverted format from our data.
  - name: TextField for searching. This field is heavily analyzed to match
  different orders, to match synonyms, phonetic similarity, German umlauts
  and other European stuff.
  - name_lc: TextField. This field is just mapped to lower case. It's used
  to boost docs with the same style of writing like the users input.
 
  Uwe
 
  Am 08.01.2013 15:30, schrieb Michael Jones:
 
   Hi,
 
  What would be the best fieldtype for a persons name? at the moment I'm
  using text_general but, if I search for bob smith, some results I get
  back
  might be rob thomas. In that it's matched 'ob'.
 
  But I only really want results that are either
 
  'bob smith'
  'bob, smith'
  'smith, bob'
  'smith bob'
 
  Thanks
 
 
 
 



Re: wildcard faceting in solr cloud

2013-01-09 Thread jmozah
I am testing it.. and i will upload it after that..


./Zahoor
HBase Musings


On 09-Jan-2013, at 2:55 AM, Upayavira u...@odoko.co.uk wrote:

 Have you uploaded a patch to JIRA???
 
 Upayavira
 
 On Tue, Jan 8, 2013, at 07:57 PM, jmozah wrote:
 Hmm. Fixed it.
 
 Did similar thing as SOLR-247 for distributed search.
 Basically modified the FacetInfo method of the FacetComponent.java to
 make it work.. :-)
 
 ./zahoor
 
 
 On 08-Jan-2013, at 9:35 PM, jmozah jmo...@gmail.com wrote:
 
 
 I can try to bump it for distributed search... 
 Some pointer where to start will be helpful...
 Can SOLR-2894 be a good start to look at this?
 
 ./Zahoor
 
 On 08-Jan-2013, at 9:27 PM, Michael Ryan mr...@moreover.com wrote:
 
 I'd guess that the patch simply doesn't implement it for distributed 
 searches. The code for distributed facets is quite a bit more complicated, 
 and I don't see it touched in this patch.
 
 -Michael
 
 -Original Message-
 From: jmozah [mailto:jmo...@gmail.com] 
 Sent: Tuesday, January 08, 2013 10:51 AM
 To: solr-user@lucene.apache.org
 Subject: wildcard faceting in solr cloud
 
 Hi
 
 I am performing wildcard faceting using the patch in SOLR-247 on solr 4.0.
 
 It works like a charm in a single instance...
 But it does not work in a distributed mode...
 
 Am i missing something?
 
 ./zahoor
 
 
 
 
 
 



RE: Highlighting: When alternateField does not exist

2013-01-09 Thread Markus Jelsma
Hi , 

That should be fairly easy to make in alternateField() in 
DefaultSolrHighlighter. We made a small change there to support globs in 
alternateField.

Cheers,
 
-Original message-
 From:Jan Høydahl jan@cominvent.com
 Sent: Wed 09-Jan-2013 12:44
 To: solr-user@lucene.apache.org
 Subject: Highlighting: When alternateField does not exist
 
 Hi,
 
 The alternateField and maxAlternateFieldLength params work well, but only as 
 long as the alternate field actually exists for the document. If it does not, 
 highlighting returns nothing.
 
 We would like this behavior
 1. Highlighting in body if matches
 2. Fallback to verbatim teaser if it exists
 3. If fallback field does not exist, look for a secondary fallback field
 
 To support this behaviour in a back-compat way, how about allowing a 
 comma-separated list of alternate fields to consider: 
 hl.alternateField=field1,field2,field3.. where the first existing one is 
 selected
 
 Or do you have other workarounds for this problem on the solr side? In this 
 case we cannot control the source DB to make sure the teaser exists.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 


Re: fieldtype for name

2013-01-09 Thread Michael Jones
Hi,

My schema file is here http://pastebin.com/ArY7xVUJ

Query (name:'ian paisley') returns ~ 3000 results
Query (name:'paisley, ian') returns ~ 250 results - That is how the name is
stored, so is returning just the results with that person.

I need all variations to return 250 results

Query (name:*ian paisley*) returns ~ 8000 results - but acceptable as I
know it has a wild card.

Thanks


On Wed, Jan 9, 2013 at 12:56 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 Without seeing the configs I would guess default query operator might be OR
 (and check docs for mm parameter on the Wiki) or there are ngrams involved.
 Former is more likely.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Jan 9, 2013 6:16 AM, Michael Jones michaelj...@gmail.com wrote:

  Also. I'm allowing users to do enter a name with quotes to search for an
  exact name. So at the moment only smith, robert will return any results
  where *robert smith* will return all variations including 'smith,
 herbert'
 
 
  On Wed, Jan 9, 2013 at 11:09 AM, Michael Jones michaelj...@gmail.com
  wrote:
 
   Thanks. It isn't necessarily the need to match 'dick' to 'robert' but
 to
   search for:
   'name surname'
   name, surname'
   'surname name'
   'surname, name'
  
   And nothing else, I don't need to worry about nick names or
 abbreviations
   of a name, just the above variations. I think I might use text_ws.
  
  
   On Tue, Jan 8, 2013 at 9:39 PM, Uwe Reh r...@hebis.uni-frankfurt.de
  wrote:
  
   Hi Michael,
  
   in our index ob bibliographic metadata, we see the need for at least
  tree
   fields:
   - name_facet: String as type, because the facet should should
 represent
   the original inverted format from our data.
   - name: TextField for searching. This field is heavily analyzed to
 match
   different orders, to match synonyms, phonetic similarity, German
 umlauts
   and other European stuff.
   - name_lc: TextField. This field is just mapped to lower case. It's
 used
   to boost docs with the same style of writing like the users input.
  
   Uwe
  
   Am 08.01.2013 15:30, schrieb Michael Jones:
  
Hi,
  
   What would be the best fieldtype for a persons name? at the moment
 I'm
   using text_general but, if I search for bob smith, some results I get
   back
   might be rob thomas. In that it's matched 'ob'.
  
   But I only really want results that are either
  
   'bob smith'
   'bob, smith'
   'smith, bob'
   'smith bob'
  
   Thanks
  
  
  
  
 



Re: fieldtype for name

2013-01-09 Thread Upayavira
Try q=name:(ian paisley)q.op=AND

Does that work better for you?

It would also match Ian James Paisley, but not Ian Jackson.

Upayavira

On Wed, Jan 9, 2013, at 01:30 PM, Michael Jones wrote:
 Hi,
 
 My schema file is here http://pastebin.com/ArY7xVUJ
 
 Query (name:'ian paisley') returns ~ 3000 results
 Query (name:'paisley, ian') returns ~ 250 results - That is how the name
 is
 stored, so is returning just the results with that person.
 
 I need all variations to return 250 results
 
 Query (name:*ian paisley*) returns ~ 8000 results - but acceptable as I
 know it has a wild card.
 
 Thanks
 
 
 On Wed, Jan 9, 2013 at 12:56 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
  Hi,
 
  Without seeing the configs I would guess default query operator might be OR
  (and check docs for mm parameter on the Wiki) or there are ngrams involved.
  Former is more likely.
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Jan 9, 2013 6:16 AM, Michael Jones michaelj...@gmail.com wrote:
 
   Also. I'm allowing users to do enter a name with quotes to search for an
   exact name. So at the moment only smith, robert will return any results
   where *robert smith* will return all variations including 'smith,
  herbert'
  
  
   On Wed, Jan 9, 2013 at 11:09 AM, Michael Jones michaelj...@gmail.com
   wrote:
  
Thanks. It isn't necessarily the need to match 'dick' to 'robert' but
  to
search for:
'name surname'
name, surname'
'surname name'
'surname, name'
   
And nothing else, I don't need to worry about nick names or
  abbreviations
of a name, just the above variations. I think I might use text_ws.
   
   
On Tue, Jan 8, 2013 at 9:39 PM, Uwe Reh r...@hebis.uni-frankfurt.de
   wrote:
   
Hi Michael,
   
in our index ob bibliographic metadata, we see the need for at least
   tree
fields:
- name_facet: String as type, because the facet should should
  represent
the original inverted format from our data.
- name: TextField for searching. This field is heavily analyzed to
  match
different orders, to match synonyms, phonetic similarity, German
  umlauts
and other European stuff.
- name_lc: TextField. This field is just mapped to lower case. It's
  used
to boost docs with the same style of writing like the users input.
   
Uwe
   
Am 08.01.2013 15:30, schrieb Michael Jones:
   
 Hi,
   
What would be the best fieldtype for a persons name? at the moment
  I'm
using text_general but, if I search for bob smith, some results I get
back
might be rob thomas. In that it's matched 'ob'.
   
But I only really want results that are either
   
'bob smith'
'bob, smith'
'smith, bob'
'smith bob'
   
Thanks
   
   
   
   
  
 


Restore hot backup

2013-01-09 Thread marotosg
Hi,

Is possible to restore an old backup without shutting down Solr?

Regards,
Sergio



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Restore-hot-backup-tp4031866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fieldtype for name

2013-01-09 Thread Michael Jones
Brilliant! Thank you!

On Wed, Jan 9, 2013 at 1:37 PM, Upayavira u...@odoko.co.uk wrote:

 q=name:(ian paisley)q.op=AND


Performance issue with group.ngroups=true

2013-01-09 Thread Mickael Magniez
Hi,

I have a performance issue with group.ngroups=true parameters.

I have an index with 100k documents (small documents, 1-10 documents per
group, group on string field), if i make a q=*:*...group.ngroups=true i
have 4s responsetime vs 50ms without the ngroups parameters.

Is it a workaround for this problem?



Mickael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-issue-with-group-ngroups-true-tp4031888.html
Sent from the Solr - User mailing list archive at Nabble.com.


CoreAdmin STATUS performance

2013-01-09 Thread Shahar Davidson
Hi All,

I have a client app that uses SolrJ and which requires to collect the names 
(and just the names) of all loaded cores.
I have about 380 Solr Cores on a single Solr server (net indices size is about 
220GB).

Running the STATUS action takes about 800ms - that seems a bit too long, given 
my requirements.

So here are my questions:
1) Is there any way to get _only_ the core Name of all cores?
2) Why does the STATUS request take such a long time and is there a way to 
improve its performance?

Thanks,

Shahar.


massive memory consumption of grouping feature

2013-01-09 Thread clawu01
Hello,

we are upgrading solr from 1.3 to 4.0.
In solr 1.3 we used the SOLR-236 patch to realize grouping/ field
collapsing.
We did not have a memory issue with the field collapsing feature in our 1.3
version.
However, we do now. The query looks something like this:

http://localhost:8983/solr/select?fl=*,scoregroup.ngroups=truegroup.limit=-1group.field=someGroupingFieldgroup=truefq=someField:someValuefq=anotherField:anotherValuewt=xmlfq=thirdField:[0+TO+1]rows=3

as you can see the q parameter is empty, but it does not make a difference
if I query for q=someValue+anotherValue

The result returns:
int name=matches3772/int
int name=ngroups2175/int

We have a memory consumption of about 4G. What causes this massive memory
consumption? How can it be reduced?

Regards,

Claas 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/massive-memory-consumption-of-grouping-feature-tp4031895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Clean Up Aged Index Using DeletionPolicy

2013-01-09 Thread hyrax
Hey Shawn,

Thanks a lot for your detailed explanation on deletionPolicy.
Although it's frustrated that Solr doesn't support the function I need, I'm
really glad that you point it out so that I can move on.
What I'm thinking now is adding a new field for the time a document is
indexed, so a simple range query can delete the aged indexes I want to
remove to maintain my disk space.

Thanks again,
Hao



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clean-Up-Aged-Index-Using-DeletionPolicy-tp4031704p4031896.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud - shard distribution

2013-01-09 Thread James Thomas
Hi,

Simple question, I hope.

Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr 
nodes.
I issued the following command to create a collection with 3 shards, and a 
replication factor=2.  So a total of 6 shards.
 curl 
'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2'
The end result was the following shard distribution:
shard1 - node #13, #15  (with #13 as leader)
   shard2 - node #15, #16  (with #15 as leader)
   shard3 - node #11, #16  (with #11 as leader)

Since I am using the default value of 1 for 'maxShardsPerNode', I was surprised 
to see that Solr created two shards on instance #16.  I expected that each Solr 
node (there are 6) would each be assigned one shard from the collection.  Is 
this a bug or expected behavior?

Thanks,
James


Re: performance improvements on ip look up query

2013-01-09 Thread Lee Carroll
Hi Otis

The cache was modest 4096 with a hit rate of 0.23 after a 24hr period.
We doubled it and the hit rate went to 0.25. Our interpretation is ip is
pretty much a cache busting value ? and
cache size is not at play here.

the q param is just startIpNum:[* TO 180891652]AND endIpNum:[180891652 TO
*] so again our
interpretation is its got little reuse

Could we re-formulate the query to be more per-formant ?


On 9 January 2013 12:56, Otis Gospodnetic otis.gospodne...@gmail.comwrote:

 Hi,

 Maybe your cache is too small?  How big is it and does the hit rate change
 if you make it bigger?

 Do any parts of the query repeat a lot? Maybe there is room for fq.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Jan 9, 2013 6:08 AM, Lee Carroll lee.a.carr...@googlemail.com
 wrote:

  Hi
 
  We are doing a lat/lon look up query using ip address.
 
  We have a 6.5 million document core of the following structure
  start ip block
  end ip block
  location id
  location_lat_lon
 
  the field defs are
  types
  fieldType name=string class=solr.StrField sortMissingLast=true
  omitNorms=true/
  fieldType name=tlong class=solr.TrieLongField precisionStep=8
  omitNorms=true positionIncrementGap=0/
  fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
  omitNorms=true positionIncrementGap=0/
  fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
  omitNorms=true positionIncrementGap=0/
  fieldType name=location class=solr.LatLonType subFieldSuffix=
  _coordinate/
  /types
  fields
  field name=startIp type=string indexed=true stored=false
  required=
  true/
  field name=startIpNum type=tlong indexed=true stored=false
  required
  =true/
  field name=endIpNum type=tlong indexed=true stored=false
  required=
  true/
  field name=locId type=string indexed=true stored=true required=
  true/
  field name=countryCode type=string indexed=true stored=true
  required=false/
  field name=cityName type=string indexed=false stored=true
 required
  =false/
  field name=latLon type=location indexed=true stored=true
  required=
  true/
  field name=latitude type=string indexed=false stored=true
 required
  =true/
  field name=longitude type=string indexed=false stored=true
  required
  =true/
  dynamicField name=*_coordinate type=tdouble indexed=true stored=
  false/
  /fields
 
  the query at the moment is simply a range query
 
  q=startIpNum:[* TO 180891652]AND endIpNum:[180891652 TO *]
 
  we are seeing a full query cache with a low hit rate 0.2 and a high
  eviction rate which makes sense given the use of ip address in the query.
 
  query time mean is 120.
 
  Is their a better way of structuring the core for this usecase ?
  I suspect our heap memory settings are conservative 1g but will need to
  convince our sys admins to change this (they are not ringing any resource
  alarm bells) just the query is a little slow
 



Re: Restore hot backup

2013-01-09 Thread Upayavira
If you are in multicore mode, you can stop a core, move the backed up
files into place, and restart/recreate the core. That would have the
effect you desire.

You may well be able to get away with swapping out the files and
reloading the core, but the above would be safer. Best make sure you're
not indexing or committing to the core at the time you do this.

Upayavira

On Wed, Jan 9, 2013, at 01:48 PM, marotosg wrote:
 Hi,
 
 Is possible to restore an old backup without shutting down Solr?
 
 Regards,
 Sergio
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Restore-hot-backup-tp4031866.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud - shard distribution

2013-01-09 Thread Mark Miller
I just tried this. I started 6 nodes with collection1 spread across two shards. 
Looked at the admin-cloud-graph view and everything looked right and green.

Next, I copy and pasted your command and refreshed the graph cloud view.

I see a new collection called consumer1 - all of it's nodes are green and the 
collection consists of 3 shards. Each shard has 1 leader and 1 replica, each 
hosted by a different Solr instance.

In other words, it seemed to work for me.

- Mark

On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote:

 Hi,
 
 Simple question, I hope.
 
 Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr 
 nodes.
 I issued the following command to create a collection with 3 shards, and a 
 replication factor=2.  So a total of 6 shards.
 curl 
 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2'
 The end result was the following shard distribution:
shard1 - node #13, #15  (with #13 as leader)
   shard2 - node #15, #16  (with #15 as leader)
   shard3 - node #11, #16  (with #11 as leader)
 
 Since I am using the default value of 1 for 'maxShardsPerNode', I was 
 surprised to see that Solr created two shards on instance #16.  I expected 
 that each Solr node (there are 6) would each be assigned one shard from the 
 collection.  Is this a bug or expected behavior?
 
 Thanks,
 James



Re: DIH fails after processing roughly 10million records

2013-01-09 Thread Shawn Heisey

On 1/8/2013 11:19 PM, vijeshnair wrote:

Yes Shawn, the batchSize is -1 only and I also have the mergeScheduler
exactly same as you mentioned.  When I had this problem in SOLR 3.4, I did
an extensive googling and gathered much of the tweaks and tuning from
different blogs and forums and configured the 4.0 instance. My next full run
is scheduled for this weekend, I will try with a higher mysql wait_timeout
value and update you the outcome.


With maxThreadCount at 1 and maxMergeCount at 6, I was able to complete 
full-import with no problems.  All mysql (5.1.61) server-side timeouts 
are at their defaults - they don't show up in my.cnf and I haven't 
tweaked them anywhere else either.


A full import for me consists of six simultaneous imports into six Solr 
cores, each of which is over 12 million rows.  It takes three hours, and 
each of those six imports creates a 16GB index on Solr 4.1-SNAPSHOT, 
22GB on Solr 3.5.0.  There is a seventh import as well, but it only does 
a few hundred thousand rows.  That one finishes before any major merging 
takes place.


Thanks,
Shawn



Re: Performance issue with group.ngroups=true

2013-01-09 Thread Jack Krupansky
group.ngroups=true is always going to be somewhat expensive, but in your 
case it seems more expensive than I would expect. You should check to see 
that you have enough Java JVM heap to hold more of the index and to avoid 
any excessive GCs.


-- Jack Krupansky

-Original Message- 
From: Mickael Magniez

Sent: Wednesday, January 09, 2013 10:09 AM
To: solr-user@lucene.apache.org
Subject: Performance issue with group.ngroups=true

Hi,

I have a performance issue with group.ngroups=true parameters.

I have an index with 100k documents (small documents, 1-10 documents per
group, group on string field), if i make a q=*:*...group.ngroups=true i
have 4s responsetime vs 50ms without the ngroups parameters.

Is it a workaround for this problem?



Mickael



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-issue-with-group-ngroups-true-tp4031888.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: SolrJ DirectXmlRequest

2013-01-09 Thread Ryan Josal
I also don't know what's creating them.  Maybe Solr, but also maybe Tomcat, 
maybe apache commons.  I could change java.io.tmpdir to one with more space, 
but the problem is that many of the temp files end up permanent, so eventually 
it would still run out of space.  I also considered setting the tmpdir to 
/dev/null, but that would defeat the purpose of whatever is writing those log 
files in the first place.  I could periodically clean up the tmpdir myself, but 
that feels the hackiest.

Is it fairly common to send XML to Solr this way from a remote host?  If it is, 
then that would lead me to believe Solr and any of it's libraries aren't 
causing it, and I should inspect Tomcat.  I'm using Tomcat 7.

Ryan

From: Otis Gospodnetic [otis.gospodne...@gmail.com]
Sent: Tuesday, January 08, 2013 7:29 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ DirectXmlRequest

Hi Ryan,

I'm not sure what is creating those upload files something in Solr? Or
Tomcat?

Why not specify a different temp dir via system property command line
parameter?

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 8, 2013 12:17 PM, Ryan Josal rjo...@rim.com wrote:

 I have encountered an issue where using DirectXmlRequest to index data on
 a remote host results in eventually running out have temp disk space in the
 java.io.tmpdir directory.  This occurs when I process a sufficiently large
 batch of files.  About 30% of the temporary files end up permanent.  The
 filenames look like: upload__2341cdae_13c02829b77__7ffd_00029003.tmp.  Has
 anyone else had this happen before?  The relevant code is:

 DirectXmlRequest up = new DirectXmlRequest( /update, xml );
 up.process(solr);

 where `xml` is a String containing Solr formatted XML, and `solr` is the
 SolrServer.  When disk space is eventually exhausted, this is the error
 message that is repeatedly seen on the master host:

 2013-01-07 19:22:16,911 [http-bio-8090-exec-2657] [] ERROR
 org.apache.solr.servlet.SolrDispatchFilter  [] -
 org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
 Processing of multipart/form-data request failed. No space left on device
 at
 org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
 at
 org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
 at
 org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
 at
 org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
 at
 org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 ... truncated stack trace

 I am running Solr 3.6 on an Ubuntu 12.04 server.  I am considering working
 around this by pulling out as much as I can from XMLLoader into my client,
 and processing the XML myself into SolrInputDocuments for indexing, but
 this is certainly not ideal.

 Ryan
 -
 This transmission (including any attachments) may contain confidential
 information, privileged material (including material protected by the
 solicitor-client or other applicable privileges), or constitute non-public
 information. Any use of this information by anyone other than the intended
 recipient is prohibited. If you have received this transmission in error,
 please immediately reply to the sender and delete this information from
 your system. Use, dissemination, distribution, or reproduction of this
 transmission by unintended recipients is not authorized and may be unlawful.


-
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


Re: DIH fails after processing roughly 10million records

2013-01-09 Thread Shawn Heisey

On 1/9/2013 9:41 AM, Shawn Heisey wrote:

With maxThreadCount at 1 and maxMergeCount at 6, I was able to complete
full-import with no problems.  All mysql (5.1.61) server-side timeouts
are at their defaults - they don't show up in my.cnf and I haven't
tweaked them anywhere else either.

A full import for me consists of six simultaneous imports into six Solr
cores, each of which is over 12 million rows.  It takes three hours, and
each of those six imports creates a 16GB index on Solr 4.1-SNAPSHOT,
22GB on Solr 3.5.0.  There is a seventh import as well, but it only does
a few hundred thousand rows.  That one finishes before any major merging
takes place.


Full timeout info:

mysql SHOW SESSION VARIABLES LIKE '%timeout%';
++---+
| Variable_name  | Value |
++---+
| connect_timeout| 10|
| delayed_insert_timeout | 300   |
| innodb_lock_wait_timeout   | 50|
| innodb_rollback_on_timeout | OFF   |
| interactive_timeout| 28800 |
| net_read_timeout   | 30|
| net_write_timeout  | 60|
| slave_net_timeout  | 3600  |
| table_lock_wait_timeout| 50|
| wait_timeout   | 28800 |
++---+
10 rows in set (0.00 sec)



Re: Is there faceting with Solr 4 spatial?

2013-01-09 Thread Smiley, David W.
Erick,
  Alex asked about Solr 4 spatial, and his use-case requires it because
he's got multi-value spatial fields (multiple business office locations
per document).  So the Solr 3 spatial solution you posted won't cut it.

Alex,
  You can do this in Solr 4.0.  Use one facet.query per circle (I.e.
Distance ring away from center).  Here's an example with just one
facet.query:
http://localhost:8983/solr/collection1/select?q=*%3A*wt=xmlfacet=truefac
et.query=geo:%22Intersects%28Circle%2845.15,-93.85%20d=0.045%29%29%22
That facet.query without url escaping is:
geo:Intersects(Circle(45.15,-93.85 d=0.045))
The's a 5km ring.  Repeat such facet queries for larger rings.  Each
bigger circle will of course encompass the smaller circle(s) before it.  I
suspect it's more useful to the user to see a facet count based on all
businesses within each threshold distance, versus having counts exclude
the one before basically.  But if you really want to do that, you'll have
to do that part yourself by simply subtracting one facet count from the
previous smaller ring. And to generate the filter query if they click it,
you'd then have to have a NOT clause for the smaller ring.  Ex:
  fq=geo:Intersects(Circle(45.15,-93.85 d=0.09)) NOT
geo:Intersects(Circle(45.15,-93.85 d=0.045))

I'm aware it's a bit verbose.  In Solr 4.1 I've already committed a change
to allow use of {!geofilt} which will make the syntax shorter, allowing
sharing of the pt reference, and kilometer based distances instead of
degrees.  I'm collaborating with Ryan McKinley on
https://issues.apache.org/jira/browse/SOLR-4242 A better spatial query
parser including conversations off-list but feel free to participate via
commenting.

~ David Smiley

On 1/8/13 7:33 PM, Erick Erickson erickerick...@gmail.com wrote:

For facets, doesn't
http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*;
facet=on
facet.query={!frange l=0 u=3}geodist(store,45.15,-93.85)
facet.query={!frange l=3.001 u=4}geodist(store,45.15,-93.85)
facet.query={!frange l=4.001 u=5}geodist(store,45.15,-93.85)

work (from
http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance)

Although I also confess to being really unfamiliar with all things
geodist...

Best
Erick


On Tue, Jan 8, 2013 at 4:02 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Hello,

 I am trying to understand the new Solr 4 spatial type and what it can
do. I
 sort of understand the old implementation, though also far from well.

 The use case is to have companies that has multiple offices, for which I
 indexed locations. I then want to do a 'radar' style ranges/facets, so I
 can say show me everything in 100k, in 300k, etc. The wiki page for
old
 implementation shows how to do it, but I am having troubles figuring
this
 out for new implementation.

 Regards,
Alex.
 P.s. Not yet possible, wait till 4.1/5, etc are perfectly valid
 shortest answers for me, at this stage.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)




Re: Convert Complex Lucene Query to SolrQuery

2013-01-09 Thread Jagdish Nomula
Thanks Otis and Jack for your responses.

We are trying to use embeddedsolr server with a solr query as follows:

EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, );
SolrQuery solrQuery = new  SolrQuery(luceneQuery.toString()); //
Here luceneQuery is a dismax query with additional filters
QueryResponse rsp = server.query(solrQuery);

The toString method does not give us good results and
server.query(solrQuery) fails.

As otis has suggested, we are going to take a look at LuceneQueryParser
more closely.

Thanks,

Jagdish

On Tue, Jan 8, 2013 at 9:41 PM, Jack Krupansky j...@basetechnology.comwrote:

 How complex? Does it use any of the more advanced Query Types or detailed
 options that are not supported in the Solr query syntax?

 What specific problems did you have.

 -- Jack Krupansky

 -Original Message- From: Jagdish Nomula
 Sent: Tuesday, January 08, 2013 9:13 PM
 To: solr-user@lucene.apache.org
 Subject: Convert Complex Lucene Query to SolrQuery


 Hello Solr Users,

 I am trying to convert a complex lucene query to solrquery to use it
 in a embeddedsolrserver instance.

 I have tried the regular toString method without success. Is there any
 suggested method to do this ?.

 Greatly appreciate the response.


 Thanks,




 --
 Jagadish Nomula - Senior Manager Search
 Simply Hired, Inc.
 370 San Aleso Ave, Ste. 200
 Sunnyvale, CA 94085

 simplyhired.com




-- 
*Jagadish Nomula - Senior Manager Search*
*Simply Hired, Inc.*
370 San Aleso Ave, Ste. 200
Sunnyvale, CA 94085

simplyhired.com


newbie questions about cache stats query perf

2013-01-09 Thread AJ Weber
Sorry, I did search for an answer, but didn't find an applicable one.  
I'm currently stuck on 1.4.1 (running in Tomcat 6 on 64bit Linux) for 
the time being...


When I see stats like this:
name:  documentCache
class:  org.apache.solr.search.LRUCache
version:  1.0
description:  LRU Cache(maxSize=512, initialSize=512)
lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 8158
cumulative_hits : 685
cumulative_hitratio : 0.08
cumulative_inserts : 7473
cumulative_evictions : 3023

I don't understand lookups vs. cumulative_lookups, etc.  I _do_ 
understand that a hit-ratio of 0.08 isn't a very good one.


Something I definitely find strange is that I've allocated 4G of RAM to 
the java heap, but solr consistently remains around 1.7G.  I'm trying to 
give it all the RAM I can spare (I could go higher, but it's not even 
using what I'm giving it) to make it faster.


The index takes-up roughly 25GB on disk, and indexing is very fast 
(well, nothing we're complaining about anyway).  We're trying to figure 
out why queries against the default, document content are slow (15-30 
seconds for only a few mm total documents).  Mergefactor=3, if that helps.


So if anyone could point me to someplace that defines what these stats 
mean, and if anyone has any immediate tips/tricks/recommendations as to 
increasing query performance (and whether this documentCache is a good 
candidate to be increased substantially), I would very much appreciate it.


-AJ



RE: SolrCloud - shard distribution

2013-01-09 Thread James Thomas
Thanks for the quick reply Mark.
I tried all kinds of variations, I could not get all 6 nodes to participate.
So I downloaded the source code and took a look at 
OverseerCollectionProcessor.java
I think my result is as-coded.

Line 251 has this loop:
  for (int i = 1; i = numSlices; i++) {
for (int j = 1; j = repFactor; j++) {
  String nodeName = nodeList.get(((i - 1) + (j - 1)) % nodeList.size());

So for my inputs, numSlices=3 and repFactor=2.
And the logic here will choose the same node for these two slices:
--- slice1, rep2 (i=2,j=1)  == chooses node[1]
--- slice2, rep1 (i=1,j=2)  == chooses node[1]

BTW, I did notice the comment in the code:
  // we need to look at every node and see how many cores it serves
  // add our new cores to existing nodes serving the least number of cores
  // but (for now) require that each core goes on a distinct node.
  
  // TODO: add smarter options that look at the current number of cores per
  // node?
  // for now we just go random

Thanks,
James

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Wednesday, January 09, 2013 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud - shard distribution

I just tried this. I started 6 nodes with collection1 spread across two shards. 
Looked at the admin-cloud-graph view and everything looked right and green.

Next, I copy and pasted your command and refreshed the graph cloud view.

I see a new collection called consumer1 - all of it's nodes are green and the 
collection consists of 3 shards. Each shard has 1 leader and 1 replica, each 
hosted by a different Solr instance.

In other words, it seemed to work for me.

- Mark

On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote:

 Hi,
 
 Simple question, I hope.
 
 Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr 
 nodes.
 I issued the following command to create a collection with 3 shards, and a 
 replication factor=2.  So a total of 6 shards.
 curl 
 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2'
 The end result was the following shard distribution:
shard1 - node #13, #15  (with #13 as leader)
   shard2 - node #15, #16  (with #15 as leader)
   shard3 - node #11, #16  (with #11 as leader)
 
 Since I am using the default value of 1 for 'maxShardsPerNode', I was 
 surprised to see that Solr created two shards on instance #16.  I expected 
 that each Solr node (there are 6) would each be assigned one shard from the 
 collection.  Is this a bug or expected behavior?
 
 Thanks,
 James





Re: SOLR '0' Status: Communication Error

2013-01-09 Thread ddineshkumar
I forgot to mention.When I add documents to SOLR, I add it in batches of 50.
Because my table has a lot of records, I have to do in batches due to memory
constraints. The 'Communication error' occurs only for some batches. For
other batches, documents get added properly.  And also, I am including the
stack trace just in case if it helps.


'0' Status: Communication Error#0
C:\wamp\www\nist\application\library\SolrPhpClient\Apache\Solr\Service.php(672):
Apache_Solr_Service-_sendRawPost('http://129.107', 'add
allowDups=...')
#1
C:\wamp\www\nist\application\library\SolrPhpClient\Apache\Solr\Service.php(736):
Apache_Solr_Service-add('add allowDups=...')
#2 C:\wamp\www\nist\application\library\Nist\Console\NistSolrIndex.php(106):
Apache_Solr_Service-addDocuments(Array)
#3 C:\wamp\www\nist\application\library\Nist\Console\CrawlUNT.php(346):
Nist_Console_NistSolrIndex-createIndex()
#4 C:\wamp\www\nist\application\library\Nist\Console\CrawlUNT.php(89):
Nist_Console_CrawlUNT-CrawlParseAndIndexProfiles()
#5 C:\wamp\www\nist\application\Bootstrap.php(107):
Nist_Console_CrawlUNT-run(Object(Zend_Console_Getopt))
#6 C:\wamp\www\nist\application\Bootstrap.php(78):
Bootstrap-_runConsoleApp()
#7 C:\wamp\www\dkumar\mentis-libs\Zend\Application.php(366):
Bootstrap-run()
#8 C:\wamp\www\nist\index.php(37): Zend_Application-run()



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-0-Status-Communication-Error-tp4031698p4031949.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR '0' Status: Communication Error

2013-01-09 Thread Shawn Heisey

On 1/9/2013 11:48 AM, ddineshkumar wrote:

I forgot to mention.When I add documents to SOLR, I add it in batches of 50.
Because my table has a lot of records, I have to do in batches due to memory
constraints. The 'Communication error' occurs only for some batches. For
other batches, documents get added properly.  And also, I am including the
stack trace just in case if it helps.


If it sometimes works and sometimes you get the communications error, 
then I would guess that you are running into long garbage collection 
pauses on your Solr server that make Solr unresponsive long enough for 
the next update to time out.  Garbage collection tuning is an art form 
with a million different styles.  You could try increasing the php 
client timeouts.


Thanks,
Shawn



RE: SolrCloud - shard distribution

2013-01-09 Thread James Thomas
Oops, small copy-paste error.  Had my i's and j's backwards.
Should be:
--- slice1, rep2 (i=1,j=2)  == chooses node[1]
--- slice2, rep1 (i=2,j=1)  == chooses node[1]

-Original Message-
From: James Thomas [mailto:jtho...@camstar.com] 
Sent: Wednesday, January 09, 2013 1:39 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrCloud - shard distribution

Thanks for the quick reply Mark.
I tried all kinds of variations, I could not get all 6 nodes to participate.
So I downloaded the source code and took a look at 
OverseerCollectionProcessor.java I think my result is as-coded.

Line 251 has this loop:
  for (int i = 1; i = numSlices; i++) {
for (int j = 1; j = repFactor; j++) {
  String nodeName = nodeList.get(((i - 1) + (j - 1)) % nodeList.size());

So for my inputs, numSlices=3 and repFactor=2.
And the logic here will choose the same node for these two slices:
--- slice1, rep2 (i=2,j=1)  == chooses node[1]
--- slice2, rep1 (i=1,j=2)  == chooses node[1]

BTW, I did notice the comment in the code:
  // we need to look at every node and see how many cores it serves
  // add our new cores to existing nodes serving the least number of cores
  // but (for now) require that each core goes on a distinct node.
  
  // TODO: add smarter options that look at the current number of cores per
  // node?
  // for now we just go random

Thanks,
James

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Wednesday, January 09, 2013 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud - shard distribution

I just tried this. I started 6 nodes with collection1 spread across two shards. 
Looked at the admin-cloud-graph view and everything looked right and green.

Next, I copy and pasted your command and refreshed the graph cloud view.

I see a new collection called consumer1 - all of it's nodes are green and the 
collection consists of 3 shards. Each shard has 1 leader and 1 replica, each 
hosted by a different Solr instance.

In other words, it seemed to work for me.

- Mark

On Jan 9, 2013, at 10:58 AM, James Thomas jtho...@camstar.com wrote:

 Hi,
 
 Simple question, I hope.
 
 Using the nightly build of 4.1 from yesterday (Jan 8, 2013), I started 6 Solr 
 nodes.
 I issued the following command to create a collection with 3 shards, and a 
 replication factor=2.  So a total of 6 shards.
 curl 
 'http://localhost:11000/solr/admin/collections?action=CREATEname=consumer1numShards=3replicationFactor=2'
 The end result was the following shard distribution:
shard1 - node #13, #15  (with #13 as leader)
   shard2 - node #15, #16  (with #15 as leader)
   shard3 - node #11, #16  (with #11 as leader)
 
 Since I am using the default value of 1 for 'maxShardsPerNode', I was 
 surprised to see that Solr created two shards on instance #16.  I expected 
 that each Solr node (there are 6) would each be assigned one shard from the 
 collection.  Is this a bug or expected behavior?
 
 Thanks,
 James







Re: defaultOperator in schema.xml

2013-01-09 Thread Rafał Kuć
Hello!

You should set the q.op parameter in your request handler
configuration in solrconfig.xml instead of using the default operator
from schema.xml.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 I'm testing out Solr 4.0.  I got the sample schema working, so now I'm
 converting my existing schema (from a Solr 3.4 instance), but I'm confused
 as to what to do for the defaultOperator setting for the solrQueryParser
 field.

 For my existing Solr, I have:
 solrQueryParser defaultOperator=AND/

 The 4.0 schema says that the defaultOperator field is deprecated, and seems
 to suggest that I just pass it along in my queries.  Is there no way I can
 set it to AND by default somewhere else?  I don't control all the
 applications that use my Solr Indexes, and I want to ensure that they
 operate with AND and not OR.

 Thanks!

 -- Chris



Re: SolrJ DirectXmlRequest

2013-01-09 Thread Otis Gospodnetic
Hi Ryan,

One typically uses a Solr client library to talk to Solr instead of sending
raw XML.  For example, if your application in written in Java then you
would use SolrJ.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 12:03 PM, Ryan Josal rjo...@rim.com wrote:

 I also don't know what's creating them.  Maybe Solr, but also maybe
 Tomcat, maybe apache commons.  I could change java.io.tmpdir to one with
 more space, but the problem is that many of the temp files end up
 permanent, so eventually it would still run out of space.  I also
 considered setting the tmpdir to /dev/null, but that would defeat the
 purpose of whatever is writing those log files in the first place.  I could
 periodically clean up the tmpdir myself, but that feels the hackiest.

 Is it fairly common to send XML to Solr this way from a remote host?  If
 it is, then that would lead me to believe Solr and any of it's libraries
 aren't causing it, and I should inspect Tomcat.  I'm using Tomcat 7.

 Ryan
 
 From: Otis Gospodnetic [otis.gospodne...@gmail.com]
 Sent: Tuesday, January 08, 2013 7:29 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SolrJ DirectXmlRequest

 Hi Ryan,

 I'm not sure what is creating those upload files something in Solr? Or
 Tomcat?

 Why not specify a different temp dir via system property command line
 parameter?

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Jan 8, 2013 12:17 PM, Ryan Josal rjo...@rim.com wrote:

  I have encountered an issue where using DirectXmlRequest to index data on
  a remote host results in eventually running out have temp disk space in
 the
  java.io.tmpdir directory.  This occurs when I process a sufficiently
 large
  batch of files.  About 30% of the temporary files end up permanent.  The
  filenames look like: upload__2341cdae_13c02829b77__7ffd_00029003.tmp.
  Has
  anyone else had this happen before?  The relevant code is:
 
  DirectXmlRequest up = new DirectXmlRequest( /update, xml );
  up.process(solr);
 
  where `xml` is a String containing Solr formatted XML, and `solr` is the
  SolrServer.  When disk space is eventually exhausted, this is the error
  message that is repeatedly seen on the master host:
 
  2013-01-07 19:22:16,911 [http-bio-8090-exec-2657] [] ERROR
  org.apache.solr.servlet.SolrDispatchFilter  [] -
  org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
  Processing of multipart/form-data request failed. No space left on device
  at
 
 org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
  at
 
 org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
  at
 
 org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
  at
 
 org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
  at
 
 org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  ... truncated stack trace
 
  I am running Solr 3.6 on an Ubuntu 12.04 server.  I am considering
 working
  around this by pulling out as much as I can from XMLLoader into my
 client,
  and processing the XML myself into SolrInputDocuments for indexing, but
  this is certainly not ideal.
 
  Ryan
  -
  This transmission (including any attachments) may contain confidential
  information, privileged material (including material protected by the
  solicitor-client or other applicable privileges), or constitute
 non-public
  information. Any use of this information by anyone other than the
 intended
  recipient is prohibited. If you have received this transmission in error,
  please immediately reply to the sender and delete this information from
  your system. Use, dissemination, distribution, or reproduction of this
  transmission by unintended recipients is not authorized and may be
 unlawful.
 

 -
 This transmission (including any attachments) may contain confidential
 information, privileged material (including material protected by the
 solicitor-client or other applicable privileges), or constitute non-public
 information. Any use of this information by anyone other than the intended
 recipient is prohibited. If you have received this transmission in error,
 please immediately reply to the sender and delete this information from
 your system. Use, dissemination, distribution, or reproduction of this
 transmission by unintended recipients is not authorized and may be unlawful.



Re: Convert Complex Lucene Query to SolrQuery

2013-01-09 Thread Otis Gospodnetic
Aha.  I think the problem here is the assumption that .toString() on Lucene
query will give you a string that can then be re-parsed in the proper query
and that is currently not the case.  But if you start with the raw query
like the one you would use with the Lucene QP, you should be fine.

Can you replace:

new  SolrQuery(luceneQuery.toString());

with:

new  SolrQuery(Your Raw Query String Here)


Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 12:33 PM, Jagdish Nomula jagd...@simplyhired.comwrote:

 Thanks Otis and Jack for your responses.

 We are trying to use embeddedsolr server with a solr query as follows:

 EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, );
 SolrQuery solrQuery = new  SolrQuery(luceneQuery.toString()); //
 Here luceneQuery is a dismax query with additional filters
 QueryResponse rsp = server.query(solrQuery);

 The toString method does not give us good results and
 server.query(solrQuery) fails.

 As otis has suggested, we are going to take a look at LuceneQueryParser
 more closely.

 Thanks,

 Jagdish

 On Tue, Jan 8, 2013 at 9:41 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  How complex? Does it use any of the more advanced Query Types or detailed
  options that are not supported in the Solr query syntax?
 
  What specific problems did you have.
 
  -- Jack Krupansky
 
  -Original Message- From: Jagdish Nomula
  Sent: Tuesday, January 08, 2013 9:13 PM
  To: solr-user@lucene.apache.org
  Subject: Convert Complex Lucene Query to SolrQuery
 
 
  Hello Solr Users,
 
  I am trying to convert a complex lucene query to solrquery to use it
  in a embeddedsolrserver instance.
 
  I have tried the regular toString method without success. Is there any
  suggested method to do this ?.
 
  Greatly appreciate the response.
 
 
  Thanks,
 
 
 
 
  --
  Jagadish Nomula - Senior Manager Search
  Simply Hired, Inc.
  370 San Aleso Ave, Ste. 200
  Sunnyvale, CA 94085
 
  simplyhired.com
 



 --
 *Jagadish Nomula - Senior Manager Search*
 *Simply Hired, Inc.*
 370 San Aleso Ave, Ste. 200
 Sunnyvale, CA 94085

 simplyhired.com



Re: newbie questions about cache stats query perf

2013-01-09 Thread Otis Gospodnetic
Hi,

In your Solr version there is a notion of Searcher being opened and
reopened.  Every time that happens those non-cumulative stats reset.  The
cumulative_ stats just don't refresh, so you have numbers from when the
whole Solr started, not just from the last time Searcher opened.

Your cache is small, which is why you have evictions, which is partially
why you have low hit rate, which is partially why your queries are slow.
 But 15-30  seconds is crazy high, so I am sure there are other issues.

Note that you should *not* give Solr/Tomcat all the RAM you can spare -
leave it to the OS to use for index caching.  If you don't have issues with
full heap (OOMs or crazy GCing) with say -Xmx=2g, then use that.

Plug: http://sematext.com/spm/solr-performance-monitoring/index.html

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 12:56 PM, AJ Weber awe...@comcast.net wrote:

 Sorry, I did search for an answer, but didn't find an applicable one.  I'm
 currently stuck on 1.4.1 (running in Tomcat 6 on 64bit Linux) for the time
 being...

 When I see stats like this:
 name:  documentCache
 class:  org.apache.solr.search.**LRUCache
 version:  1.0
 description:  LRU Cache(maxSize=512, initialSize=512)
 lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 8158
 cumulative_hits : 685
 cumulative_hitratio : 0.08
 cumulative_inserts : 7473
 cumulative_evictions : 3023

 I don't understand lookups vs. cumulative_lookups, etc.  I _do_
 understand that a hit-ratio of 0.08 isn't a very good one.

 Something I definitely find strange is that I've allocated 4G of RAM to
 the java heap, but solr consistently remains around 1.7G.  I'm trying to
 give it all the RAM I can spare (I could go higher, but it's not even using
 what I'm giving it) to make it faster.

 The index takes-up roughly 25GB on disk, and indexing is very fast (well,
 nothing we're complaining about anyway).  We're trying to figure out why
 queries against the default, document content are slow (15-30 seconds for
 only a few mm total documents).  Mergefactor=3, if that helps.

 So if anyone could point me to someplace that defines what these stats
 mean, and if anyone has any immediate tips/tricks/recommendations as to
 increasing query performance (and whether this documentCache is a good
 candidate to be increased substantially), I would very much appreciate it.

 -AJ




Re: unittest fail (sometimes) for float field search

2013-01-09 Thread Roman Chyla
Hi,

It is not Eclipse related, neither codec related. There were two issues

I had a wrong configuration of NumericConfig:

new NumericConfig(4, NumberFormat.getNumberInstance(), NumericType.FLOAT))

I changed that to:
new NumericConfig(4, NumberFormat.getNumberInstance(Locale.US),
NumericType.FLOAT))

And the second problem was that I used the default float with
precisionStep=0, however NumericRangeQuery requires precision step =1
I tried all steps 1-8, and it worked only if the precison step of the field
and of the NumericConfig are the same (for range queries)


  roman





On Tue, Jan 8, 2013 at 7:34 PM, Roman Chyla roman.ch...@gmail.com wrote:

 The test checks we are properly getting/indexing data  - we index database
 and fetch parts of the documents separately from mongodb. You can look at
 the file here:
 https://github.com/romanchyla/montysolr/blob/3c18312b325874bdecefceb9df63096b2cf20ca2/contrib/adsabs/src/test/org/apache/solr/update/TestAdsDataImport.java

 But your comment made me to run the tests on command line and I am seeing
 I can't make it fail (it fails only inside Eclipse). Sorry, I should have
 tried that myself, but I am so used to running unittests inside Eclipse it
 didn't occur to me...i'll try to find out what is going on...

 thanks,

   roman




 On Tue, Jan 8, 2013 at 6:53 PM, Chris Hostetter 
 hossman_luc...@fucit.orgwrote:


 : apparently, it fails also with @SuppressCodecs(Lucene3x)

 what exactly is the test failure message?

 When you run tests that use the lucene test framework, any failure should
 include information about the random seed used to run the test -- that
 random seed affects things like the codec used, the directoryfactory used,
 etc...

 Can you confirm wether the test reliably passes/fails consistently when
 you reuse the same seed?

 Can you elaborate more on what exactly your test does? ... we probably
 need to see the entire test to make sense of why you might get
 inconsistent failures.



 -Hoss





Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Paul Jungwirth
 Are you sure a commit didn't happen between?
 Also, a background merge might have happened.

 As to using a backup, you are right, just stop solr,
 put the snapshot into index/data, and restart.


This was mentioned before but seems not to have gotten any attention: can't
you use the ReplicationHandler by just going to a URL like this?:


http://host:8080/solr/replication?command=backuplocation=/home/jboss/backup

The 2nd edition Lucene in Action book describes a way to take hot backups
without stopping your IndexWriter (pp. 374ff), and it appears that
ReplicationHandler uses a similar strategy if I'm reading the code
correctly (Solr 3.6.1; I guess v4 is the same).

It'd be great if someone more knowledgeable could confirm that you can use
the ReplicationHandler to take hot backups. I'm surprised to see such a
long thread about starting/stopping index jobs when there is such an easy
answer. Or am I mistaken and at risk of corrupt backups if I use it?

Thanks,
Paul

-- 
_
Pulchritudo splendor veritatis.


Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Otis Gospodnetic
Hi Paul,

Hot backup is OK.  There was a thread on this topic yesterday and the day
before.  But you should always try running from backup regardless of what
anyone says here, because if you have to do that one day you want to
know you verified it :)

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth
p...@illuminatedcomputing.comwrote:

  Are you sure a commit didn't happen between?
  Also, a background merge might have happened.
 
  As to using a backup, you are right, just stop solr,
  put the snapshot into index/data, and restart.


 This was mentioned before but seems not to have gotten any attention: can't
 you use the ReplicationHandler by just going to a URL like this?:



 http://host:8080/solr/replication?command=backuplocation=/home/jboss/backup

 The 2nd edition Lucene in Action book describes a way to take hot backups
 without stopping your IndexWriter (pp. 374ff), and it appears that
 ReplicationHandler uses a similar strategy if I'm reading the code
 correctly (Solr 3.6.1; I guess v4 is the same).

 It'd be great if someone more knowledgeable could confirm that you can use
 the ReplicationHandler to take hot backups. I'm surprised to see such a
 long thread about starting/stopping index jobs when there is such an easy
 answer. Or am I mistaken and at risk of corrupt backups if I use it?

 Thanks,
 Paul

 --
 _
 Pulchritudo splendor veritatis.



Re: Clean Up Aged Index Using DeletionPolicy

2013-01-09 Thread Otis Gospodnetic
Just to satisfy my curiosity - are you looking to have TTL for documents or
for indices?

The former: https://issues.apache.org/jira/browse/SOLR-3874
The latter: no issue that I know off, typically managed by the application.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 10:57 AM, hyrax hao.w...@selerityfinancial.comwrote:

 Hey Shawn,

 Thanks a lot for your detailed explanation on deletionPolicy.
 Although it's frustrated that Solr doesn't support the function I need, I'm
 really glad that you point it out so that I can move on.
 What I'm thinking now is adding a new field for the time a document is
 indexed, so a simple range query can delete the aged indexes I want to
 remove to maintain my disk space.

 Thanks again,
 Hao



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Clean-Up-Aged-Index-Using-DeletionPolicy-tp4031704p4031896.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR '0' Status: Communication Error

2013-01-09 Thread ddineshkumar
Thanks Shawn.  I tried increasing following timeouts in php:

max_execution_time
max_input_time
default_socket_timeout

But still I get 'Communication error'. Please let me know if I have to
change any other timeout in php.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-0-Status-Communication-Error-tp4031698p4032012.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Clean Up Aged Index Using DeletionPolicy

2013-01-09 Thread hyrax
Exactly what I want.
For a simple scenario:
Index a batch of documents 20 days ago and they are searchable via Solr.
After say 20 days, you can't search them anymore because they are deleted
automatically by Solr.
Thanks,
Hao



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Clean-Up-Aged-Index-Using-DeletionPolicy-tp4031704p4032019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Clean Up Aged Index Using DeletionPolicy

2013-01-09 Thread Otis Gospodnetic
Options:
1. Run delete by query every N hours/days to purge old docs
2. Create daily indices and drop them every H hours/days to get rid of all
old docs

The TTL support for 1. would probably be implemented with delete by query.
 The drawback of 1. compared to 2. is that you will pay the price when
Lucene merges segments with lots of deleted docs.  2. is cheaper.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 3:44 PM, hyrax hao.w...@selerityfinancial.comwrote:

 Exactly what I want.
 For a simple scenario:
 Index a batch of documents 20 days ago and they are searchable via Solr.
 After say 20 days, you can't search them anymore because they are deleted
 automatically by Solr.
 Thanks,
 Hao



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Clean-Up-Aged-Index-Using-DeletionPolicy-tp4031704p4032019.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Clean Up Aged Index Using DeletionPolicy

2013-01-09 Thread Walter Underwood
Solr does not delete anything automatically.

Add a timestamp field when you index.

Use delete by query to delete everything older than 20 days.

wunder

On Jan 9, 2013, at 12:44 PM, hyrax wrote:

 Exactly what I want.
 For a simple scenario:
 Index a batch of documents 20 days ago and they are searchable via Solr.
 After say 20 days, you can't search them anymore because they are deleted
 automatically by Solr.
 Thanks,
 Hao
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Clean-Up-Aged-Index-Using-DeletionPolicy-tp4031704p4032019.html
 Sent from the Solr - User mailing list archive at Nabble.com.






Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Paul Jungwirth
Yes, I agree about making sure the backups actually work, whatever the
approach. Thanks for your reply and all you've contributed to the
Solr/Lucene community. The Lucene in Action book has been a huge help to me.

Paul


On Wed, Jan 9, 2013 at 12:16 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi Paul,

 Hot backup is OK.  There was a thread on this topic yesterday and the day
 before.  But you should always try running from backup regardless of what
 anyone says here, because if you have to do that one day you want to
 know you verified it :)

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth
 p...@illuminatedcomputing.comwrote:

   Are you sure a commit didn't happen between?
   Also, a background merge might have happened.
  
   As to using a backup, you are right, just stop solr,
   put the snapshot into index/data, and restart.
 
 
  This was mentioned before but seems not to have gotten any attention:
 can't
  you use the ReplicationHandler by just going to a URL like this?:
 
 
 
 
 http://host:8080/solr/replication?command=backuplocation=/home/jboss/backup
 
  The 2nd edition Lucene in Action book describes a way to take hot backups
  without stopping your IndexWriter (pp. 374ff), and it appears that
  ReplicationHandler uses a similar strategy if I'm reading the code
  correctly (Solr 3.6.1; I guess v4 is the same).
 
  It'd be great if someone more knowledgeable could confirm that you can
 use
  the ReplicationHandler to take hot backups. I'm surprised to see such a
  long thread about starting/stopping index jobs when there is such an easy
  answer. Or am I mistaken and at risk of corrupt backups if I use it?
 
  Thanks,
  Paul
 
  --
  _
  Pulchritudo splendor veritatis.
 




-- 
_
Pulchritudo splendor veritatis.


performing a boolean query (OR) with a large number of terms

2013-01-09 Thread geeky2
hello,

environment: solr 3.5

i have a requirement to perform a boolean query (like the example below)
with a large number of terms.

the number of terms could be 15 or possibly larger.

after looking over several theads and the smiley book - i think i just have
include the parens and string all of the terms together with OR's

i just want to make sure that i am not missing anything.

is there a better or more efficient way of doing this?

http://server:port/dir/core1/select?qt=modelItemNoSearchq=itemModelNoExactMatchStr:%285-100-NGRT7%20OR%205-10-10MS7%20OR%20404%29rows=30debugQuery=onrows=40


thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/performing-a-boolean-query-OR-with-a-large-number-of-terms-tp4032039.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrJ DirectXmlRequest

2013-01-09 Thread Ryan Josal
Thanks Otis,

DirectXmlRequest is part of the SolrJ library, so I guess that means it is not 
commonly used.  My use case is that I'm applying an XSLT to the raw XML on the 
client side, instead of leaving that up to the Solr master (although even if I 
applied the XSLT on the Solr server, I'd still use DirectXmlRequest to get the 
raw XML there).  This does lead me to the idea that parsing the XML without the 
XSLT is probably better than copying some of XMLLoader to parse Solr XML as a 
workaround, and might be a good idea to do anyway.

I've done some research and I'm fairly confident that apache commons-fileupload 
library is responsible for the temp files.  There's an explanation for how 
files are cleaned up at http://commons.apache.org/fileupload/using.html in the 
Resource cleanup section.  I have observed that forcing a garbage collection 
over JMX results in all temporary files being purged.  This implies that many 
of the java.io.File objects are moving to old gen in the heap which survive 
long enough (only a few minutes in my case) to use up all tmp disk space.

I think this can probably be solved by GC tuning, or, failing that, introducing 
a (less desirable) System.gc() somewhere in the updateRequestProcessorChain.

Thanks for your help, and hopefully this will be useful if someone else runs 
into a similar problem.

Ryan

From: Otis Gospodnetic [otis.gospodne...@gmail.com]
Sent: Wednesday, January 09, 2013 11:53 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ DirectXmlRequest

Hi Ryan,

One typically uses a Solr client library to talk to Solr instead of sending
raw XML.  For example, if your application in written in Java then you
would use SolrJ.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Jan 9, 2013 at 12:03 PM, Ryan Josal rjo...@rim.com wrote:

 I also don't know what's creating them.  Maybe Solr, but also maybe
 Tomcat, maybe apache commons.  I could change java.io.tmpdir to one with
 more space, but the problem is that many of the temp files end up
 permanent, so eventually it would still run out of space.  I also
 considered setting the tmpdir to /dev/null, but that would defeat the
 purpose of whatever is writing those log files in the first place.  I could
 periodically clean up the tmpdir myself, but that feels the hackiest.

 Is it fairly common to send XML to Solr this way from a remote host?  If
 it is, then that would lead me to believe Solr and any of it's libraries
 aren't causing it, and I should inspect Tomcat.  I'm using Tomcat 7.

 Ryan
 
 From: Otis Gospodnetic [otis.gospodne...@gmail.com]
 Sent: Tuesday, January 08, 2013 7:29 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SolrJ DirectXmlRequest

 Hi Ryan,

 I'm not sure what is creating those upload files something in Solr? Or
 Tomcat?

 Why not specify a different temp dir via system property command line
 parameter?

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Jan 8, 2013 12:17 PM, Ryan Josal rjo...@rim.com wrote:

  I have encountered an issue where using DirectXmlRequest to index data on
  a remote host results in eventually running out have temp disk space in
 the
  java.io.tmpdir directory.  This occurs when I process a sufficiently
 large
  batch of files.  About 30% of the temporary files end up permanent.  The
  filenames look like: upload__2341cdae_13c02829b77__7ffd_00029003.tmp.
  Has
  anyone else had this happen before?  The relevant code is:
 
  DirectXmlRequest up = new DirectXmlRequest( /update, xml );
  up.process(solr);
 
  where `xml` is a String containing Solr formatted XML, and `solr` is the
  SolrServer.  When disk space is eventually exhausted, this is the error
  message that is repeatedly seen on the master host:
 
  2013-01-07 19:22:16,911 [http-bio-8090-exec-2657] [] ERROR
  org.apache.solr.servlet.SolrDispatchFilter  [] -
  org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
  Processing of multipart/form-data request failed. No space left on device
  at
 
 org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
  at
 
 org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
  at
 
 org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
  at
 
 org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
  at
 
 org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  ... truncated stack trace
 
  I am running Solr 3.6 on an Ubuntu 12.04 server.  I am considering
 working
  around this by pulling out as 

Re: Pause and resume indexing on SolR 4 for backups

2013-01-09 Thread Upayavira
The point was as much about how to use a backup, as to how to make one
in the first place. the replication handler can handle spitting out a
backup, but there's no straightforward way to tell Solr to switch to
another set of index files instead. You'd have to do clever stuff with
the CoreAdminHandler, I reckon.

Upayavira

On Wed, Jan 9, 2013, at 09:27 PM, Paul Jungwirth wrote:
 Yes, I agree about making sure the backups actually work, whatever the
 approach. Thanks for your reply and all you've contributed to the
 Solr/Lucene community. The Lucene in Action book has been a huge help to
 me.
 
 Paul
 
 
 On Wed, Jan 9, 2013 at 12:16 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
  Hi Paul,
 
  Hot backup is OK.  There was a thread on this topic yesterday and the day
  before.  But you should always try running from backup regardless of what
  anyone says here, because if you have to do that one day you want to
  know you verified it :)
 
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Wed, Jan 9, 2013 at 3:12 PM, Paul Jungwirth
  p...@illuminatedcomputing.comwrote:
 
Are you sure a commit didn't happen between?
Also, a background merge might have happened.
   
As to using a backup, you are right, just stop solr,
put the snapshot into index/data, and restart.
  
  
   This was mentioned before but seems not to have gotten any attention:
  can't
   you use the ReplicationHandler by just going to a URL like this?:
  
  
  
  
  http://host:8080/solr/replication?command=backuplocation=/home/jboss/backup
  
   The 2nd edition Lucene in Action book describes a way to take hot backups
   without stopping your IndexWriter (pp. 374ff), and it appears that
   ReplicationHandler uses a similar strategy if I'm reading the code
   correctly (Solr 3.6.1; I guess v4 is the same).
  
   It'd be great if someone more knowledgeable could confirm that you can
  use
   the ReplicationHandler to take hot backups. I'm surprised to see such a
   long thread about starting/stopping index jobs when there is such an easy
   answer. Or am I mistaken and at risk of corrupt backups if I use it?
  
   Thanks,
   Paul
  
   --
   _
   Pulchritudo splendor veritatis.
  
 
 
 
 
 -- 
 _
 Pulchritudo splendor veritatis.


SOLR/Velocity Test Cases

2013-01-09 Thread Marcos Mendez
Hi,

I'm trying to write some tests based on SolrTestCaseJ4 that test using velocity 
in SOLR. I found VelocityResponseWriterTest.java, but this does not test that. 
In fact it has a todo to do what I want to do. 

Anyone have an example out there?

I just need to check if velocity is loaded with my configuration. Any help is 
appreciated.

Re: How to run many MoreLikeThis request efficiently?

2013-01-09 Thread Yandong Yao
Any comments on this? Thanks very much in advance!

2013/1/9 Yandong Yao yydz...@gmail.com

 Hi Solr Guru,

 I have two set of documents in one SolrCore, each set has about 1M
 documents with different document type, say 'type1' and 'type2'.

 Many documents in first set are very similar with 1 or 2 documents in the
 second set, What I want to get is:  for each document in set 2, return the
 most similar document in set 1 using either 'MoreLikeThisHandler' or
 'MoreLikeThisComponent'.

 Currently I use following code to get the result, while it will send far
 too many request to Solr server serially.  Is there any way to enhance this
 besides using multi-threading?  Thanks very much!

 for each document in set 2 whose type is 'type2'
 run MoreLikeThis request against Solr server and get the most similar
 document
 end.

 Regards,
 Yandong



what is difference between 4.1 and 5.x

2013-01-09 Thread solr-user
just curious as to what the difference is between 4.1 and 5.0

i.e. is 4.1 a maintenance branch for what is currently 4.0 or are they very
different designs/architectures



--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-difference-between-4-1-and-5-x-tp4032064.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CoreAdmin STATUS performance

2013-01-09 Thread Yury Kats
On 1/9/2013 10:38 AM, Shahar Davidson wrote:
 Hi All,
 
 I have a client app that uses SolrJ and which requires to collect the names 
 (and just the names) of all loaded cores.
 I have about 380 Solr Cores on a single Solr server (net indices size is 
 about 220GB).
 
 Running the STATUS action takes about 800ms - that seems a bit too long, 
 given my requirements.
 
 So here are my questions:
 1) Is there any way to get _only_ the core Name of all cores?

If you have access to the filesystem, you could just read solr.xml where all 
cores are listed.


Re: what is difference between 4.1 and 5.x

2013-01-09 Thread Shawn Heisey

On 1/9/2013 5:11 PM, solr-user wrote:

just curious as to what the difference is between 4.1 and 5.0

i.e. is 4.1 a maintenance branch for what is currently 4.0 or are they very
different designs/architectures


There are several code branches in the SVN repository.  I'll talk about 
three of them here.  The first is lucene_solr_4_0, which is the branch 
that 4.0.0 was released from.  The second is called branch_4x, which is 
the 4.x development branch.  This includes a version number of 4.1 right 
now.  The third branch isn't really a branch - it's the main development 
area, called trunk.  The trunk currently includes a version number of 5.0.


Very soon now, a lucene_solr_4_1 branch will be created from which 
version 4.1 will get released.  When that happens, branch_4x will get 
renumbered to 4.2.  At some point in the future, trunk will be copied to 
another branch called branch_5x, and then trunk will have its internal 
version number changed to 6.0.


New development happens on both branch_4x and trunk.  Right now, both 
development trees are actually very similar - most of the changes that 
have happened in the last few months have been made to both. 
Eventually, someone will come up with a major design overhaul that won't 
be appropriate to include in branch_4x.  That kind of change will only 
get put into trunk.


Thanks,
Shawn



Re: CoreAdmin STATUS performance

2013-01-09 Thread Shawn Heisey

On 1/9/2013 8:38 AM, Shahar Davidson wrote:

I have a client app that uses SolrJ and which requires to collect the names 
(and just the names) of all loaded cores.
I have about 380 Solr Cores on a single Solr server (net indices size is about 
220GB).

Running the STATUS action takes about 800ms - that seems a bit too long, given 
my requirements.

So here are my questions:
1) Is there any way to get _only_ the core Name of all cores?
2) Why does the STATUS request take such a long time and is there a way to 
improve its performance?


I'm curious why 800 milliseconds isn't fast enough.  How often do you 
actually need to gather this information?


If you are incorporating it into something that will get accessed a lot 
(such as a status servlet page), put a minimum interval capability 
into the part of the program that contacts solr.  If it's been less than 
that minimum interval (5-10 seconds could be a recommended starting 
point) since the last time the information was gathered, just use the 
previously stored response rather than make a new request.


I have used this approach in a homegrown status servlet written with 
SolrJ.  I have been trying to come up with a way to generalize the 
paradigm so it can be incorporated directly into a future SolrJ version.


Thanks,
Shawn



Re: SOLR/Velocity Test Cases

2013-01-09 Thread Erik Hatcher
Marcos -

I just happen to be tinkering with VrW over the last few days (to get some big 
improvements across the board with it and the /browse UI into Solr 5.0, and 
maybe eventually 4.x too), so I whipped up such a test case just now.

Here's the short and sweet version:

  public void testVelocityResponseWriterRegistered() {
QueryResponseWriter writer = h.getCore().getQueryResponseWriter(velocity);
assertTrue(VrW registered check, writer instanceof 
VelocityResponseWriter);
  }

This required that I put in the test solrconfig.xml queryResponseWriter 
name=velocity class=solr.VelocityResponseWriter/ (which was not there 
before, as it wasn't needed for the direct VrW test that already was there).  

I added another test too, to check a template from the conf/velocity directory 
being rendered like this:

  public void testSolrResourceLoaderTemplate() throws Exception {
assertEquals(0, h.query(req(q,*:*, 
wt,velocity,v.template,test)));
  }

And I added a conf/velocity/test.vm file with just this in it: 
$response.response.response.numFound

So there ya go... I'll commit these in hopefully the near future along with the 
other related stuff.

I'm curious - what are you using VrW for?

Erik


On Jan 9, 2013, at 17:43 , Marcos Mendez wrote:

 Hi,
 
 I'm trying to write some tests based on SolrTestCaseJ4 that test using 
 velocity in SOLR. I found VelocityResponseWriterTest.java, but this does not 
 test that. In fact it has a todo to do what I want to do. 
 
 Anyone have an example out there?
 
 I just need to check if velocity is loaded with my configuration. Any help is 
 appreciated.



Schema Field Names i18n

2013-01-09 Thread Daryl Robbins
Anyone have experience with internationalizing the field names in the SOLR 
schema, so users in different languages can specify fields in their own 
language? My first thoughts would be to create a custom search component or 
query parser than would convert localized field names back to the English names 
in the schema, but I haven't dived in too deep yet. Any input would be greatly 
appreciated.

Thanks,

Daryl


__
* This message is intended only for the use of the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential and exempt from disclosure under applicable law. Unless you are 
the addressee (or authorized to receive for the addressee), you may not use, 
copy or disclose the message or any information contained in the message. If 
you have received this message in error, please advise the sender by reply 
e-mail, and delete the message, or call +1-613-747-4698. *



Re: How to run many MoreLikeThis request efficiently?

2013-01-09 Thread Otis Gospodnetic
Patience, young Yandong :)

Multi-threading *in your application* is the way to go. Alternatively, one
could write a custom SearchComponent that is called once and inside of
which the whole work is done after just one call to it. This component
could then write the output somewhere, like in a new index since making a
blocking call to it may time out.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 9, 2013 6:07 PM, Yandong Yao yydz...@gmail.com wrote:

 Any comments on this? Thanks very much in advance!

 2013/1/9 Yandong Yao yydz...@gmail.com

  Hi Solr Guru,
 
  I have two set of documents in one SolrCore, each set has about 1M
  documents with different document type, say 'type1' and 'type2'.
 
  Many documents in first set are very similar with 1 or 2 documents in the
  second set, What I want to get is:  for each document in set 2, return
 the
  most similar document in set 1 using either 'MoreLikeThisHandler' or
  'MoreLikeThisComponent'.
 
  Currently I use following code to get the result, while it will send far
  too many request to Solr server serially.  Is there any way to enhance
 this
  besides using multi-threading?  Thanks very much!
 
  for each document in set 2 whose type is 'type2'
  run MoreLikeThis request against Solr server and get the most similar
  document
  end.
 
  Regards,
  Yandong
 



Re: SOLR/Velocity Test Cases

2013-01-09 Thread Erik Hatcher
And to add a little to this, since it looked ugly below, the 
$response.response.response.numFound thing is something I'm going to improve to 
make it leaner and cleaner to get at the actual result set and other response 
structures.  $response is the actual SolrQueryResponse, and navigating that 
down to numFound through NamedLists and so on is pretty ridiculous looking.

Erik

On Jan 9, 2013, at 19:54 , Erik Hatcher wrote:

 Marcos -
 
 I just happen to be tinkering with VrW over the last few days (to get some 
 big improvements across the board with it and the /browse UI into Solr 5.0, 
 and maybe eventually 4.x too), so I whipped up such a test case just now.
 
 Here's the short and sweet version:
 
  public void testVelocityResponseWriterRegistered() {
QueryResponseWriter writer = 
 h.getCore().getQueryResponseWriter(velocity);
assertTrue(VrW registered check, writer instanceof 
 VelocityResponseWriter);
  }
 
 This required that I put in the test solrconfig.xml queryResponseWriter 
 name=velocity class=solr.VelocityResponseWriter/ (which was not there 
 before, as it wasn't needed for the direct VrW test that already was there).  
 
 I added another test too, to check a template from the conf/velocity 
 directory being rendered like this:
 
  public void testSolrResourceLoaderTemplate() throws Exception {
assertEquals(0, h.query(req(q,*:*, 
 wt,velocity,v.template,test)));
  }
 
 And I added a conf/velocity/test.vm file with just this in it: 
 $response.response.response.numFound
 
 So there ya go... I'll commit these in hopefully the near future along with 
 the other related stuff.
 
 I'm curious - what are you using VrW for?
 
   Erik
 
 
 On Jan 9, 2013, at 17:43 , Marcos Mendez wrote:
 
 Hi,
 
 I'm trying to write some tests based on SolrTestCaseJ4 that test using 
 velocity in SOLR. I found VelocityResponseWriterTest.java, but this does not 
 test that. In fact it has a todo to do what I want to do. 
 
 Anyone have an example out there?
 
 I just need to check if velocity is loaded with my configuration. Any help 
 is appreciated.
 



Re: DIH fails after processing roughly 10million records

2013-01-09 Thread Lance Norskog

At this scale, your indexing job is prone to break in various ways.
If you want this to be reliable, it should be able to restart in the 
middle of an upload, rather than starting over.


On 01/08/2013 10:19 PM, vijeshnair wrote:

Yes Shawn, the batchSize is -1 only and I also have the mergeScheduler
exactly same as you mentioned.  When I had this problem in SOLR 3.4, I did
an extensive googling and gathered much of the tweaks and tuning from
different blogs and forums and configured the 4.0 instance. My next full run
is scheduled for this weekend, I will try with a higher mysql wait_timeout
value and update you the outcome.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-fails-after-processing-roughly-10million-records-tp4031508p4031779.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: SolrCloud - Query performance degrades with multiple servers

2013-01-09 Thread sausarkar
Hi Yonik,

Could you merger this feature with 4.0 branch, We tried to use 4.1 it did
solve the CPU spike but we did get other issues. As we are very tight on
schedule so it would very beneficial if you could merge this feature with
4.0 branch.

Let me know.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4032088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud graph status is out of date

2013-01-09 Thread Mark Miller
It may be able to do that because it's forwarding requests to other nodes that 
are up?

Would be good to dig into the logs to see if you can narrow in on the reason 
for the recovery_failed.

- Mark

On Jan 9, 2013, at 8:52 PM, Zeng Lames lezhi.z...@gmail.com wrote:

 Hi ,
 
 we meet below strange case in production environment. from the Solr Admin
 Console - Cloud - Graph, we can find that one node is in recovery_failed
 status. but at the same time, we found that the recovery_failed node can
 server query/update request normally.
 
 any idea about it? thanks!
 
 -- 
 Best Wishes!
 Lames



Re: SolrCloud - Query performance degrades with multiple servers

2013-01-09 Thread Shawn Heisey

On 1/9/2013 7:01 PM, sausarkar wrote:

Hi Yonik,

Could you merger this feature with 4.0 branch, We tried to use 4.1 it did
solve the CPU spike but we did get other issues. As we are very tight on
schedule so it would very beneficial if you could merge this feature with
4.0 branch.


4.1 *is* the next release after 4.0.  At this point, with 4.1 close to 
release, there will not be a 4.0.1.


Thanks,
Shawn



Setting up new SolrCloud - need some guidance

2013-01-09 Thread Shawn Heisey
I have a lot of experience with Solr, starting with 1.4.0 and currently 
running 3.5.0 in production.  I am working on a 4.1 upgrade, but I have 
not touched SolrCloud at all.


I now need to set up a brand new Solr deployment to replace a custom 
Lucene system, and due to the way the client works, SolrCloud is going 
to be the only reasonable way to have redundancy.  I am planning to have 
two Solr servers (each also running standalone zookeeper) plus a third 
low-end machine that will complete the zookeeper ensemble.  I'm planning 
to set it up with numShards=1, replica 2.


It will need to support several different collections.  Although it's 
possible that those collections will all use the same schema and config 
at first, it's likely that they will diverge before too long.


What would be the best practice for setting up zookeeper for this? 
Would I use multiple zk chroots, or put everything into one?  I've been 
trying to figure this out on my own, without much luck.  Can anyone 
share some known good ZK/SolrCloud configs?


What gotchas am I likely to run into?  The existing config that I've 
come up with for this system heavily uses xinclude in solrconfig.xml. 
Is it possible to use xinclude when the config files are in zookeeper, 
or will I have to re-combine it?


Thanks,
Shawn


Re: Setting up new SolrCloud - need some guidance

2013-01-09 Thread Mark Miller
I'd put everything into one. You can upload different named sets of config 
files and point collections either to the same sets or different sets.

You can really think about it the same way you would setting up a single node 
with multiple cores. The main difference is that it's easier to share sets of 
config files across collections if you want to. You don't need to at all though.

I'm not sure if xinclude works with zk, but I don't think it does.

- Mark

On Jan 9, 2013, at 10:31 PM, Shawn Heisey s...@elyograg.org wrote:

 I have a lot of experience with Solr, starting with 1.4.0 and currently 
 running 3.5.0 in production.  I am working on a 4.1 upgrade, but I have not 
 touched SolrCloud at all.
 
 I now need to set up a brand new Solr deployment to replace a custom Lucene 
 system, and due to the way the client works, SolrCloud is going to be the 
 only reasonable way to have redundancy.  I am planning to have two Solr 
 servers (each also running standalone zookeeper) plus a third low-end machine 
 that will complete the zookeeper ensemble.  I'm planning to set it up with 
 numShards=1, replica 2.
 
 It will need to support several different collections.  Although it's 
 possible that those collections will all use the same schema and config at 
 first, it's likely that they will diverge before too long.
 
 What would be the best practice for setting up zookeeper for this? Would I 
 use multiple zk chroots, or put everything into one?  I've been trying to 
 figure this out on my own, without much luck.  Can anyone share some known 
 good ZK/SolrCloud configs?
 
 What gotchas am I likely to run into?  The existing config that I've come up 
 with for this system heavily uses xinclude in solrconfig.xml. Is it possible 
 to use xinclude when the config files are in zookeeper, or will I have to 
 re-combine it?
 
 Thanks,
 Shawn



Re: How to run many MoreLikeThis request efficiently?

2013-01-09 Thread Yandong Yao
Hi Otis,

Really appreciate your help on this!!  Will go with multi-thread firstly,
and then provide a custom component when performance is not good enough.

Regards,
Yandong

2013/1/10 Otis Gospodnetic otis.gospodne...@gmail.com

 Patience, young Yandong :)

 Multi-threading *in your application* is the way to go. Alternatively, one
 could write a custom SearchComponent that is called once and inside of
 which the whole work is done after just one call to it. This component
 could then write the output somewhere, like in a new index since making a
 blocking call to it may time out.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Jan 9, 2013 6:07 PM, Yandong Yao yydz...@gmail.com wrote:

  Any comments on this? Thanks very much in advance!
 
  2013/1/9 Yandong Yao yydz...@gmail.com
 
   Hi Solr Guru,
  
   I have two set of documents in one SolrCore, each set has about 1M
   documents with different document type, say 'type1' and 'type2'.
  
   Many documents in first set are very similar with 1 or 2 documents in
 the
   second set, What I want to get is:  for each document in set 2, return
  the
   most similar document in set 1 using either 'MoreLikeThisHandler' or
   'MoreLikeThisComponent'.
  
   Currently I use following code to get the result, while it will send
 far
   too many request to Solr server serially.  Is there any way to enhance
  this
   besides using multi-threading?  Thanks very much!
  
   for each document in set 2 whose type is 'type2'
   run MoreLikeThis request against Solr server and get the most
 similar
   document
   end.
  
   Regards,
   Yandong
  
 



Re: SolrCloud graph status is out of date

2013-01-09 Thread Zeng Lames
thanks Mark. will further dig into the logs. there is another problem
related.

we have collections with 3 shards (2 nodes in one shard), the collection
have about 1000 records in it. but unfortunately that after the leader is
down, replica node failed to become the leader.the detail is : after the
leader node is down, replica node try to become the new leader, but it said

===
ShardLeaderElectionContext.runLeaderProcess(131) - Running the leader
process.
ShardLeaderElectionContext.shouldIBeLeader(331) - Checking if I should try
and be the leader.
ShardLeaderElectionContext.shouldIBeLeader(339) - My last published State
was Active, it's okay to be the leader.
ShardLeaderElectionContext.runLeaderProcess(164) - I may be the new leader
- try and sync
SyncStrategy.sync(89) - Sync replicas to
http://localhost:8486/solr/exception/
PeerSync.sync(182) - PeerSync: core=exception
url=http://localhost:8486/solr START
replicas=[http://localhost:8483/solr/exception/] nUpdates=100
PeerSync.sync(250) - PeerSync: core=exception
url=http://localhost:8486/solr DONE.
 We have no versions.  sync failed.
SyncStrategy.log(114) - Sync Failed
ShardLeaderElectionContext.rejoinLeaderElection(311) - There is a better
leader candidate than us - going back into recovery
DefaultSolrCoreState.doRecovery(214) - Running recovery - first canceling
any ongoing recovery


after that, it try to recovery from the leader node, which is already down.
then recovery + failed + recovery.

is it related to SOLR-3939 and SOLR-3940? but the index data isn't empty.


On Thu, Jan 10, 2013 at 10:09 AM, Mark Miller markrmil...@gmail.com wrote:

 It may be able to do that because it's forwarding requests to other nodes
 that are up?

 Would be good to dig into the logs to see if you can narrow in on the
 reason for the recovery_failed.

 - Mark

 On Jan 9, 2013, at 8:52 PM, Zeng Lames lezhi.z...@gmail.com wrote:

  Hi ,
 
  we meet below strange case in production environment. from the Solr Admin
  Console - Cloud - Graph, we can find that one node is in
 recovery_failed
  status. but at the same time, we found that the recovery_failed node can
  server query/update request normally.
 
  any idea about it? thanks!
 
  --
  Best Wishes!
  Lames




-- 
Best Wishes!
Lames