subject:"Solr 4.0"

Re: Replication Problem from solr-3.6 to solr-4.0

2014-07-23 Thread Sree..

I did optimize the master and the slave started replicating the indices!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-Problem-from-solr-3-6-to-solr-4-0-tp4025028p4148953.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replication Problem from solr-3.6 to solr-4.0

2014-07-22 Thread askumar1444

Same with me too, in a multi-core Master/Slave.

11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Master's
generation: 87
11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Slave's
generation: 3
11:17:30.476 [snapPuller-8-thread-1] INFO  o.a.s.h.SnapPuller - Starting
replication process
11:17:30.713 [snapPuller-8-thread-1] ERROR o.a.s.h.SnapPuller - No files to
download for index generation: 87

Any solution/fix for it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-Problem-from-solr-3-6-to-solr-4-0-tp4025028p4148703.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Upgrade of solr 4.0 to 4.8.1 query

2014-06-02 Thread Erick Erickson

Upgrade steps are carried along in the CHANGES.txt file, there's a section
for every release (i.e. 4.1 -> 4.2, 4.5 -> 4.7) etc. There's no 4.0 -> 4.8
in a single go though. So I'd start there.

Best,
Erick


On Mon, Jun 2, 2014 at 7:14 AM, Alexandre Rafalovitch 
wrote:

> You can do lots of new stuff, but I believe the old config will run ok
> without changes. One thing to be aware of is the logging jar
> unbundling and manual correction for that when running under tomcat.
> That's on the Wiki somewhere and should have been covered in 4.2 to
> 4.7 change.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Mon, Jun 2, 2014 at 7:34 PM, Steve Howe 
> wrote:
> > Hi all,
> >
> > First time posting so the regular sorry if this is a popular question..
> >
> > Anyhoo - I'm running solr 4.0 on a test rig with multicore and I would
> like
> > to upgrade to 4.8.1. I can't find any clear tutorials on this on the web
> > and I can only see a thread on 4.2 -> 4.7 on the mailing list.
> >
> > Can anyone confirm of I need to change config wildly on an upgrade, using
> > tomcat and war files, of 4.0 -> 4.8.1 in situ pls?
> >
> > Cheers
> >
> > Steve
>

Re: Upgrade of solr 4.0 to 4.8.1 query

2014-06-02 Thread Alexandre Rafalovitch

You can do lots of new stuff, but I believe the old config will run ok
without changes. One thing to be aware of is the logging jar
unbundling and manual correction for that when running under tomcat.
That's on the Wiki somewhere and should have been covered in 4.2 to
4.7 change.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Mon, Jun 2, 2014 at 7:34 PM, Steve Howe  wrote:
> Hi all,
>
> First time posting so the regular sorry if this is a popular question..
>
> Anyhoo - I'm running solr 4.0 on a test rig with multicore and I would like
> to upgrade to 4.8.1. I can't find any clear tutorials on this on the web
> and I can only see a thread on 4.2 -> 4.7 on the mailing list.
>
> Can anyone confirm of I need to change config wildly on an upgrade, using
> tomcat and war files, of 4.0 -> 4.8.1 in situ pls?
>
> Cheers
>
> Steve

Upgrade of solr 4.0 to 4.8.1 query

2014-06-02 Thread Steve Howe

Hi all,

First time posting so the regular sorry if this is a popular question..

Anyhoo - I'm running solr 4.0 on a test rig with multicore and I would like
to upgrade to 4.8.1. I can't find any clear tutorials on this on the web
and I can only see a thread on 4.2 -> 4.7 on the mailing list.

Can anyone confirm of I need to change config wildly on an upgrade, using
tomcat and war files, of 4.0 -> 4.8.1 in situ pls?

Cheers

Steve

Re: Replication Problem from solr-3.6 to solr-4.0

2014-03-07 Thread yuegary

Hi,

i am running into the exact same problem:

27534 [qtp989080272-12] INFO  org.apache.solr.core.SolrCore  – [collection1]
webapp=/solr path=/replication
params={command=details&_=1394164320017&wt=json} status=0 QTime=12 
28906 [qtp989080272-12] INFO  org.apache.solr.core.SolrCore  – [collection1]
webapp=/solr path=/replication
params={command=fetchindex&_=1394164321407&wt=json} status=0 QTime=0 
28910 [explicit-fetchindex-cmd] INFO  org.apache.solr.handler.SnapPuller  –
Master's generation: 17100
28911 [explicit-fetchindex-cmd] INFO  org.apache.solr.handler.SnapPuller  –
Slave's generation: 1
28911 [explicit-fetchindex-cmd] INFO  org.apache.solr.handler.SnapPuller  –
Starting replication process
28915 [explicit-fetchindex-cmd] ERROR org.apache.solr.handler.SnapPuller  –
No files to download for index generation: 17100

I am using solr 4.7 on slave. Master is running solr 3.5
Would you be able to shed some light?

Thank you in advance!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-Problem-from-solr-3-6-to-solr-4-0-tp4025028p4121896.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 is stripping XML format from RSS content field

2013-10-01 Thread eShard

If anyone is interested, I managed to resolve this a long time ago.
I used a Data Import Handler instead and it worked beautifully.
DIH are very forgiving and it takes what ever XML data is there and injects
it into the Solr Index.
It's a lot faster than crawling too.
You use XPATH to map the fields to your schema.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809p4092961.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Store 2 dimensional array( of int values) in solr 4.0

2013-09-06 Thread Jack Krupansky


You still haven't supplied any queries.

If all you really need is the JSON as a blob, simply store it as a string 
and parse the JSON in your application layer.


-- Jack Krupansky

-Original Message- 
From: A Geek

Sent: Friday, September 06, 2013 10:30 AM
To: solr user
Subject: RE: Store 2 dimensional array( of int values) in solr 4.0

Hi,Thanks for the quick reply. Sure, please find below the details as per 
your query.
Essentially, I want to retrieve the doc through JSON [using JSON format as 
SOLR result output]and want JSON to pick the the data from the dataX field 
as a two dimensional array of ints. When I store the data as show below, it 
shows up in JSON array of strings where the internal array is basically 
shown as strings (because thats how the field is configured and I'm storing, 
not finding any other option). Following is the current JSON output that I'm 
able to fetch:
"dataX":["[20130614, 2]","[20130615, 11]","[20130616, 1]","[20130617, 
1]","[20130619, 8]","[20130620, 5]","[20130623, 5]"]

whereas I want  to fetch the dataX as something like:
"dataX":[[20130614, 2],[20130615, 11],[20130616, 1],[20130617, 1],[20130619, 
8],[20130620, 5],[20130623, 5]]
as can be seen, the dataX is essentially a 2D array where the internal array 
is of two ints, one being date and other being the count.

Please point me in the right direction. Appreciate your time.
Thanks.


From: j...@basetechnology.com
To: solr-user@lucene.apache.org
Subject: Re: Store 2 dimensional array( of int values) in solr 4.0
Date: Fri, 6 Sep 2013 08:44:06 -0400

First you need to tell us how you wish to use and query the data. That 
will
largely determine how the data must be stored. Give us a few example 
queries

of how you would like your application to be able to access the data.

Note that Lucene has only simple multivalued fields - no structure or
nesting within a single field other that a list of scalar values.

But you can always store a complex structure as a BSON blob or JSON string
if all you want is to store and retrieve it in its entirety without 
querying
its internal structure. And note that Lucene queries are field level - 
does

a field contain or match a scalar value.

-- Jack Krupansky

-Original Message- 
From: A Geek

Sent: Friday, September 06, 2013 7:10 AM
To: solr user
Subject: Store 2 dimensional array( of int values) in solr 4.0

hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0].
Basically I've the following data:
[[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ...

The inner array being used to keep some count say X for that particular 
day.

Currently, I'm using the following field to store this data:

and I'm using python library pySolr to store the data. Currently the data
that gets stored looks like this(its array of strings)
[20121108, 1][20121110,
7][2012, 2][20121112, 2][20121113,
2][20121116, 1]
Is there a way, i can store the 2 dimensional array and the inner array 
can

contain int values, like the one shown in the beginning example, such that
the the final/stored data in SOLR looks something like: 
20121108  7  
 20121110 12 
 20121110 12 

Just a guess, I think for this case, we need to add one more field[the 
index

for instance], for each inner array which will again be multivalued (which
will store int values only)? How do I add the actual 2 dimensional array,
how to pass the inner arrays and how to store the full doc that contains
this 2 dimensional array. Please help me out sort this issue.
Please share your views and point me in the right direction. Any help 
would

be highly appreciated.
I found similar things on the web, but not the one I'm looking for:
http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html
Thanks

RE: Store 2 dimensional array( of int values) in solr 4.0

2013-09-06 Thread A Geek

Hi,Thanks for the quick reply. Sure, please find below the details as per your 
query.
Essentially, I want to retrieve the doc through JSON [using JSON format as SOLR 
result output]and want JSON to pick the the data from the dataX field as a two 
dimensional array of ints. When I store the data as show below, it shows up in 
JSON array of strings where the internal array is basically shown as strings 
(because thats how the field is configured and I'm storing, not finding any 
other option). Following is the current JSON output that I'm able to fetch: 
"dataX":["[20130614, 2]","[20130615, 11]","[20130616, 1]","[20130617, 
1]","[20130619, 8]","[20130620, 5]","[20130623, 5]"]
whereas I want  to fetch the dataX as something like: 
"dataX":[[20130614, 2],[20130615, 11],[20130616, 1],[20130617, 1],[20130619, 
8],[20130620, 5],[20130623, 5]]
as can be seen, the dataX is essentially a 2D array where the internal array is 
of two ints, one being date and other being the count.
Please point me in the right direction. Appreciate your time.
Thanks.

> From: j...@basetechnology.com
> To: solr-user@lucene.apache.org
> Subject: Re: Store 2 dimensional array( of int values) in solr 4.0
> Date: Fri, 6 Sep 2013 08:44:06 -0400
> 
> First you need to tell us how you wish to use and query the data. That will 
> largely determine how the data must be stored. Give us a few example queries 
> of how you would like your application to be able to access the data.
> 
> Note that Lucene has only simple multivalued fields - no structure or 
> nesting within a single field other that a list of scalar values.
> 
> But you can always store a complex structure as a BSON blob or JSON string 
> if all you want is to store and retrieve it in its entirety without querying 
> its internal structure. And note that Lucene queries are field level - does 
> a field contain or match a scalar value.
> 
> -- Jack Krupansky
> 
> -----Original Message- 
> From: A Geek
> Sent: Friday, September 06, 2013 7:10 AM
> To: solr user
> Subject: Store 2 dimensional array( of int values) in solr 4.0
> 
> hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0]. 
> Basically I've the following data:
> [[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ...
> 
> The inner array being used to keep some count say X for that particular day. 
> Currently, I'm using the following field to store this data:
>  multiValued="true"/>
> and I'm using python library pySolr to store the data. Currently the data 
> that gets stored looks like this(its array of strings)
> [20121108, 1][20121110, 
> 7][2012, 2][20121112, 2][20121113, 
> 2][20121116, 1]
> Is there a way, i can store the 2 dimensional array and the inner array can 
> contain int values, like the one shown in the beginning example, such that 
> the the final/stored data in SOLR looks something like: 
> 20121108  7  
>  20121110 12 
>  20121110 12 
> 
> Just a guess, I think for this case, we need to add one more field[the index 
> for instance], for each inner array which will again be multivalued (which 
> will store int values only)? How do I add the actual 2 dimensional array, 
> how to pass the inner arrays and how to store the full doc that contains 
> this 2 dimensional array. Please help me out sort this issue.
> Please share your views and point me in the right direction. Any help would 
> be highly appreciated.
> I found similar things on the web, but not the one I'm looking for: 
> http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html
> Thanks 
>

Re: Store 2 dimensional array( of int values) in solr 4.0

2013-09-06 Thread Jack Krupansky

First you need to tell us how you wish to use and query the data. That will 
largely determine how the data must be stored. Give us a few example queries 
of how you would like your application to be able to access the data.


Note that Lucene has only simple multivalued fields - no structure or 
nesting within a single field other that a list of scalar values.


But you can always store a complex structure as a BSON blob or JSON string 
if all you want is to store and retrieve it in its entirety without querying 
its internal structure. And note that Lucene queries are field level - does 
a field contain or match a scalar value.


-- Jack Krupansky

-Original Message- 
From: A Geek

Sent: Friday, September 06, 2013 7:10 AM
To: solr user
Subject: Store 2 dimensional array( of int values) in solr 4.0

hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0]. 
Basically I've the following data:

[[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ...

The inner array being used to keep some count say X for that particular day. 
Currently, I'm using the following field to store this data:
multiValued="true"/>
and I'm using python library pySolr to store the data. Currently the data 
that gets stored looks like this(its array of strings)
[20121108, 1][20121110, 
7][2012, 2][20121112, 2][20121113, 
2][20121116, 1]
Is there a way, i can store the 2 dimensional array and the inner array can 
contain int values, like the one shown in the beginning example, such that 
the the final/stored data in SOLR looks something like: 

20121108  7  
 20121110 12 
 20121110 12 

Just a guess, I think for this case, we need to add one more field[the index 
for instance], for each inner array which will again be multivalued (which 
will store int values only)? How do I add the actual 2 dimensional array, 
how to pass the inner arrays and how to store the full doc that contains 
this 2 dimensional array. Please help me out sort this issue.
Please share your views and point me in the right direction. Any help would 
be highly appreciated.
I found similar things on the web, but not the one I'm looking for: 
http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html
Thanks

Store 2 dimensional array( of int values) in solr 4.0

2013-09-06 Thread A Geek

hi All, I'm trying to store a 2 dimensional array in SOLR [version 4.0]. 
Basically I've the following data: 
[[20121108, 1],[20121110, 7],[2012, 2],[20121112, 2]] ...

The inner array being used to keep some count say X for that particular day. 
Currently, I'm using the following field to store this data: 

and I'm using python library pySolr to store the data. Currently the data that 
gets stored looks like this(its array of strings)
[20121108, 1][20121110, 
7][2012, 2][20121112, 2][20121113, 
2][20121116, 1]
Is there a way, i can store the 2 dimensional array and the inner array can 
contain int values, like the one shown in the beginning example, such that the 
the final/stored data in SOLR looks something like: 
20121108  7  
 20121110 12 
 20121110 12 

Just a guess, I think for this case, we need to add one more field[the index 
for instance], for each inner array which will again be multivalued (which will 
store int values only)? How do I add the actual 2 dimensional array, how to 
pass the inner arrays and how to store the full doc that contains this 2 
dimensional array. Please help me out sort this issue.
Please share your views and point me in the right direction. Any help would be 
highly appreciated. 
I found similar things on the web, but not the one I'm looking for: 
http://lucene.472066.n3.nabble.com/Two-dimensional-array-in-Solr-schema-td4003309.html
Thanks

Re: Solr 4.0 Functions in FL: performance?

2013-08-30 Thread Cristian Cascetta

Whoa that's cool!

It can simplify mani front-end calculations.  - obviously I don't want to
use it to make a simple sum :)

Thanks!

c.


2013/8/30 Andrea Gazzarini 

> Hi,
> not actually sure I got the point but
>
>
>  Are values calculated over the whole set of docs? Only over the resulting
>>
> set of doc? Or, better, over the docs actually serialized in results.
>
> The third: a function is like a "virtual" field computed in real-time
> associated with each (returned) doc.
>
>
>  i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in
>>
> my result response because i put &rows=10 in my query.
> I put fl=sum(fieldA,fieldB) in my query.
> How many times is the sum of fieldA+fieldB executed?
>
> 10
>
> Best,
> Andrea
>
>
> On 08/30/2013 09:59 AM, Cristian Cascetta wrote:
>
>> Hello,
>>
>> when I put a function in the Field List, when are field values calculated
>> and on wich docs?
>>
>> Are values calculated over the whole set of docs? Only over the resulting
>> set of doc? Or, better, over the docs actually serialized in results.
>>
>> i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in
>> my result response because i put &rows=10 in my query.
>>
>> I put fl=sum(fieldA,fieldB) in my query.
>>
>> How many times is the sum of fieldA+fieldB executed?
>>
>> thx,
>> c.
>>
>>
>

Re: Solr 4.0 Functions in FL: performance?

2013-08-30 Thread Andrea Gazzarini


Hi,
not actually sure I got the point but


Are values calculated over the whole set of docs? Only over the resulting

set of doc? Or, better, over the docs actually serialized in results.

The third: a function is like a "virtual" field computed in real-time 
associated with each (returned) doc.



i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in

my result response because i put &rows=10 in my query.
I put fl=sum(fieldA,fieldB) in my query.
How many times is the sum of fieldA+fieldB executed?

10

Best,
Andrea

On 08/30/2013 09:59 AM, Cristian Cascetta wrote:

Hello,

when I put a function in the Field List, when are field values calculated
and on wich docs?

Are values calculated over the whole set of docs? Only over the resulting
set of doc? Or, better, over the docs actually serialized in results.

i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in
my result response because i put &rows=10 in my query.

I put fl=sum(fieldA,fieldB) in my query.

How many times is the sum of fieldA+fieldB executed?

thx,
c.

Solr 4.0 Functions in FL: performance?

2013-08-30 Thread Cristian Cascetta

Hello,

when I put a function in the Field List, when are field values calculated
and on wich docs?

Are values calculated over the whole set of docs? Only over the resulting
set of doc? Or, better, over the docs actually serialized in results.

i.e. I have 1000 docs in the index, 100 docs matching my query, 10 docs in
my result response because i put &rows=10 in my query.

I put fl=sum(fieldA,fieldB) in my query.

How many times is the sum of fieldA+fieldB executed?

thx,
c.

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Walter Underwood

Mixing fuzzy with phonetic can give bizarre matches. I worked on a search 
engine that did that.

You really don't want to mix stemming, phonetic, and fuzzy. They are distinct 
transformations of the surface word that do different things.

Stemming: conflate different inflections of the same word, like car and cars.
Phonetic: conflate words that sound similar, like moody and mudie.
Fuzzy: conflate words with different spellings or misspellings, like smith, 
smyth, and smit.

If you want all of these, make three fields with separate transformations.

wunder

On Aug 28, 2013, at 5:46 AM, Erick Erickson wrote:

> No, ComplexPhraseQuery has been around for quite a while but
> never incorporated into the code base, it's pretty much what you
> need to do both fuzzy and phrase at once.
> 
> But, doesn't phonetic really incorporate at least a flavor of fuzzy?
> Is it close enough for your needs to just do phonetic matches?
> 
> Best
> Erick
> 
> 
> On Wed, Aug 28, 2013 at 8:31 AM, Prasi S  wrote:
> 
>> sry , i copied it wrong. Below is the correct analysis.
>> 
>> Index time
>> 
>> ST
>> trinity
>> services
>> SF
>> trinity
>> services
>> LCF
>> trinity
>> services
>> SF
>> trinity
>> services
>> SF
>> trinity
>> services
>> WDF
>> trinity
>> services
>> SF
>> triniti
>> servic
>> PF
>> TRNTtriniti
>> SRFKservic
>> HWF
>> TRNTtriniti
>> SRFKservic
>> PSF
>> TRNTtriniti
>> SRFKservic
>> 
>> 
>> 
>> *Query time*
>> ST
>> trinity
>> services
>> SF
>> trinity
>> services
>> LCF
>> trinity
>> services
>> WDF
>> trinity
>> services
>> SF
>> triniti
>> servic
>> PSF
>> triniti
>> servic
>> PF
>> TRNTtriniti
>> SRFKservic
>> 
>> Apart from this, fuzzy would be for indivual words and proximity would be
>> phrase. Is this correct.
>> also can we have fuzzy on phrases?
>> 
>> 
>> On Wed, Aug 28, 2013 at 5:58 PM, Prasi S  wrote:
>> 
>>> hi Erick,
>>> Yes it is correct. These results are because of stemming + phonetic
>>> matching. Below is the
>>> 
>>> Index time
>>> 
>>> ST
>>>   trinity
>>>  services
>>> SF
>>>   trinity
>>>  services
>>> LCF
>>>   trinity
>>>  services
>>> SF
>>>   trinity
>>>  services
>>> SF
>>>   trinity
>>>  services
>>> WDF
>>>   trinity
>>>  services
>>> Query time
>>> 
>>> SF
>>>   triniti
>>>  servic
>>> PF
>>>   TRNT  triniti
>>>  SRFK  servic
>>> HWF
>>>   TRNT  triniti
>>>  SRFK  servic
>>> PSF
>>>   TRNT  triniti
>>>  SRFK  servic
>>> Apart from this, fuzzy would be for indivual words and proximity would be
>>> phrase. Is this correct.
>>> also can we have fuzzy on phrases?
>>> 
>>> 
>>> 
>>> On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson >> wrote:
>>> 
>>>> The first thing I'd recommend is to look at the admin/analysis
>>>> page. I suspect you aren't seeing fuzzy query results
>>>> at all, what you're seeing is the result of stemming.
>>>> 
>>>> Stemming is algorithmic, so sometimes produces very
>>>> surprising results, i.e. Trinidad and Trinigee may stem
>>>> to something like triniti.
>>>> 
>>>> But you didn't provide the field definition so it's just a guess.
>>>> 
>>>> Best
>>>> Erick
>>>> 
>>>> 
>>>> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S  wrote:
>>>> 
>>>>> Hi,
>>>>> with solr 4.0 the fuzzy query syntax is like  ~1 (or 2)
>>>>> Proximity search is like "value"~20.
>>>>> 
>>>>> How does this differentiate between the two searches. My thought was
>>>>> promiximity would be on phrases and fuzzy on individual words. Is that
>>>>> correct?
>>>>> 
>>>>> I wasnted to do a promiximity search for text field and gave the below
>>>>> query,
>>>>> 
>> :/collection1/select?q="trinity%20service"~50&debugQuery=yes,
>>>>> 
>>>>> it gives me results as
>>>>> 
>>>>> 
>>>>> 
>>>>> *Trinidad *Services
>>>>> 
>>>>> 
>>>>> Trinity Services
>>>>> 
>>>>> 
>>>>> Trinity Services
>>>>> 
>>>>> 
>>>>> *Trinitee *Service
>>>>> 
>>>>> How to differentiate between fuzzy and proximity.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Prasi
>>>>> 
>>>> 
>>> 
>>> 
>> 

--
Walter Underwood
wun...@wunderwood.org

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Erick Erickson

No, ComplexPhraseQuery has been around for quite a while but
never incorporated into the code base, it's pretty much what you
need to do both fuzzy and phrase at once.

But, doesn't phonetic really incorporate at least a flavor of fuzzy?
Is it close enough for your needs to just do phonetic matches?

Best
Erick


On Wed, Aug 28, 2013 at 8:31 AM, Prasi S  wrote:

> sry , i copied it wrong. Below is the correct analysis.
>
> Index time
>
> ST
> trinity
> services
> SF
> trinity
> services
> LCF
> trinity
> services
> SF
> trinity
> services
> SF
> trinity
> services
> WDF
> trinity
> services
> SF
> triniti
> servic
> PF
> TRNTtriniti
> SRFKservic
> HWF
> TRNTtriniti
> SRFKservic
> PSF
> TRNTtriniti
> SRFKservic
>
>
>
> *Query time*
> ST
> trinity
> services
> SF
> trinity
> services
> LCF
> trinity
> services
> WDF
> trinity
> services
> SF
> triniti
> servic
> PSF
> triniti
> servic
> PF
> TRNTtriniti
> SRFKservic
>
> Apart from this, fuzzy would be for indivual words and proximity would be
> phrase. Is this correct.
> also can we have fuzzy on phrases?
>
>
> On Wed, Aug 28, 2013 at 5:58 PM, Prasi S  wrote:
>
> > hi Erick,
> > Yes it is correct. These results are because of stemming + phonetic
> > matching. Below is the
> >
> > Index time
> >
> >  ST
> >trinity
> >   services
> >  SF
> >trinity
> >   services
> >  LCF
> >trinity
> >   services
> >  SF
> >trinity
> >   services
> >  SF
> >trinity
> >   services
> >  WDF
> >trinity
> >   services
> > Query time
> >
> > SF
> >triniti
> >   servic
> >  PF
> >TRNT  triniti
> >   SRFK  servic
> >  HWF
> >TRNT  triniti
> >   SRFK  servic
> >  PSF
> >TRNT  triniti
> >   SRFK  servic
> > Apart from this, fuzzy would be for indivual words and proximity would be
> > phrase. Is this correct.
> > also can we have fuzzy on phrases?
> >
> >
> >
> > On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson  >wrote:
> >
> >> The first thing I'd recommend is to look at the admin/analysis
> >> page. I suspect you aren't seeing fuzzy query results
> >> at all, what you're seeing is the result of stemming.
> >>
> >> Stemming is algorithmic, so sometimes produces very
> >> surprising results, i.e. Trinidad and Trinigee may stem
> >> to something like triniti.
> >>
> >> But you didn't provide the field definition so it's just a guess.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S  wrote:
> >>
> >> > Hi,
> >> > with solr 4.0 the fuzzy query syntax is like  ~1 (or 2)
> >> > Proximity search is like "value"~20.
> >> >
> >> > How does this differentiate between the two searches. My thought was
> >> > promiximity would be on phrases and fuzzy on individual words. Is that
> >> > correct?
> >> >
> >> > I wasnted to do a promiximity search for text field and gave the below
> >> > query,
> >> >
> :/collection1/select?q="trinity%20service"~50&debugQuery=yes,
> >> >
> >> > it gives me results as
> >> >
> >> > 
> >> > 
> >> > *Trinidad *Services
> >> > 
> >> > 
> >> > Trinity Services
> >> > 
> >> > 
> >> > Trinity Services
> >> > 
> >> > 
> >> > *Trinitee *Service
> >> >
> >> > How to differentiate between fuzzy and proximity.
> >> >
> >> >
> >> > Thanks,
> >> > Prasi
> >> >
> >>
> >
> >
>

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Prasi S

sry , i copied it wrong. Below is the correct analysis.

Index time

ST
trinity
services
SF
trinity
services
LCF
trinity
services
SF
trinity
services
SF
trinity
services
WDF
trinity
services
SF
triniti
servic
PF
TRNTtriniti
SRFKservic
HWF
TRNTtriniti
SRFKservic
PSF
TRNTtriniti
SRFKservic



*Query time*
ST
trinity
services
SF
trinity
services
LCF
trinity
services
WDF
trinity
services
SF
triniti
servic
PSF
triniti
servic
PF
TRNTtriniti
SRFKservic

Apart from this, fuzzy would be for indivual words and proximity would be
phrase. Is this correct.
also can we have fuzzy on phrases?


On Wed, Aug 28, 2013 at 5:58 PM, Prasi S  wrote:

> hi Erick,
> Yes it is correct. These results are because of stemming + phonetic
> matching. Below is the
>
> Index time
>
>  ST
>trinity
>   services
>  SF
>trinity
>   services
>  LCF
>trinity
>   services
>  SF
>trinity
>   services
>  SF
>trinity
>   services
>  WDF
>trinity
>   services
> Query time
>
> SF
>triniti
>   servic
>  PF
>TRNT  triniti
>   SRFK  servic
>  HWF
>TRNT  triniti
>   SRFK  servic
>  PSF
>TRNT  triniti
>   SRFK  servic
> Apart from this, fuzzy would be for indivual words and proximity would be
> phrase. Is this correct.
> also can we have fuzzy on phrases?
>
>
>
> On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson 
> wrote:
>
>> The first thing I'd recommend is to look at the admin/analysis
>> page. I suspect you aren't seeing fuzzy query results
>> at all, what you're seeing is the result of stemming.
>>
>> Stemming is algorithmic, so sometimes produces very
>> surprising results, i.e. Trinidad and Trinigee may stem
>> to something like triniti.
>>
>> But you didn't provide the field definition so it's just a guess.
>>
>> Best
>> Erick
>>
>>
>> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S  wrote:
>>
>> > Hi,
>> > with solr 4.0 the fuzzy query syntax is like  ~1 (or 2)
>> > Proximity search is like "value"~20.
>> >
>> > How does this differentiate between the two searches. My thought was
>> > promiximity would be on phrases and fuzzy on individual words. Is that
>> > correct?
>> >
>> > I wasnted to do a promiximity search for text field and gave the below
>> > query,
>> > :/collection1/select?q="trinity%20service"~50&debugQuery=yes,
>> >
>> > it gives me results as
>> >
>> > 
>> > 
>> > *Trinidad *Services
>> > 
>> > 
>> > Trinity Services
>> > 
>> > 
>> > Trinity Services
>> > 
>> > 
>> > *Trinitee *Service
>> >
>> > How to differentiate between fuzzy and proximity.
>> >
>> >
>> > Thanks,
>> > Prasi
>> >
>>
>
>

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Prasi S

hi Erick,
Yes it is correct. These results are because of stemming + phonetic
matching. Below is the

Index time

ST
trinity
services
SF
trinity
services
LCF
trinity
services
SF
trinity
services
SF
trinity
services
WDF
trinity
services
Query time

SF
triniti
servic
PF
TRNTtriniti
SRFKservic
HWF
TRNTtriniti
SRFKservic
PSF
TRNTtriniti
SRFKservic
Apart from this, fuzzy would be for indivual words and proximity would be
phrase. Is this correct.
also can we have fuzzy on phrases?



On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson wrote:

> The first thing I'd recommend is to look at the admin/analysis
> page. I suspect you aren't seeing fuzzy query results
> at all, what you're seeing is the result of stemming.
>
> Stemming is algorithmic, so sometimes produces very
> surprising results, i.e. Trinidad and Trinigee may stem
> to something like triniti.
>
> But you didn't provide the field definition so it's just a guess.
>
> Best
> Erick
>
>
> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S  wrote:
>
> > Hi,
> > with solr 4.0 the fuzzy query syntax is like  ~1 (or 2)
> > Proximity search is like "value"~20.
> >
> > How does this differentiate between the two searches. My thought was
> > promiximity would be on phrases and fuzzy on individual words. Is that
> > correct?
> >
> > I wasnted to do a promiximity search for text field and gave the below
> > query,
> > :/collection1/select?q="trinity%20service"~50&debugQuery=yes,
> >
> > it gives me results as
> >
> > 
> > 
> > *Trinidad *Services
> > 
> > 
> > Trinity Services
> > 
> > 
> > Trinity Services
> > 
> > 
> > *Trinitee *Service
> >
> > How to differentiate between fuzzy and proximity.
> >
> >
> > Thanks,
> > Prasi
> >
>

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Erick Erickson

The first thing I'd recommend is to look at the admin/analysis
page. I suspect you aren't seeing fuzzy query results
at all, what you're seeing is the result of stemming.

Stemming is algorithmic, so sometimes produces very
surprising results, i.e. Trinidad and Trinigee may stem
to something like triniti.

But you didn't provide the field definition so it's just a guess.

Best
Erick

On Wed, Aug 28, 2013 at 7:43 AM, Prasi S  wrote:

> Hi,
> with solr 4.0 the fuzzy query syntax is like  ~1 (or 2)
> Proximity search is like "value"~20.
>
> How does this differentiate between the two searches. My thought was
> promiximity would be on phrases and fuzzy on individual words. Is that
> correct?
>
> I wasnted to do a promiximity search for text field and gave the below
> query,
> :/collection1/select?q="trinity%20service"~50&debugQuery=yes,
>
> it gives me results as
>
> 
> 
> *Trinidad *Services
> 
> 
> Trinity Services
> 
> 
> Trinity Services
> 
> 
> *Trinitee *Service
>
> How to differentiate between fuzzy and proximity.
>
>
> Thanks,
> Prasi
>

Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Prasi S

Hi,
with solr 4.0 the fuzzy query syntax is like  ~1 (or 2)
Proximity search is like "value"~20.

How does this differentiate between the two searches. My thought was
promiximity would be on phrases and fuzzy on individual words. Is that
correct?

I wasnted to do a promiximity search for text field and gave the below
query,
:/collection1/select?q="trinity%20service"~50&debugQuery=yes,

it gives me results as



*Trinidad *Services


Trinity Services


Trinity Services


*Trinitee *Service

How to differentiate between fuzzy and proximity.


Thanks,
Prasi

RE: SOLR 4.0 frequent admin problem

2013-07-04 Thread David Quarterman

Cheers, Roman! It was a default Jetty set up so now added a 'work' directory 
and that's in use now.

-Original Message-
From: Roman Chyla [mailto:roman.ch...@gmail.com] 
Sent: 04 July 2013 15:00
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4.0 frequent admin problem

Yes :-)  see SOLR-118, seems an old issue...
On 4 Jul 2013 06:43, "David Quarterman"  wrote:

> Hi,
>
> About once a week the admin system comes up with SolrCore 
> Initialization Failures. There's nothing in the logs and SOLR 
> continues to work in the application it's supporting and in the 'direct 
> access' mode (i.e.
> http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).
>
> The cure is to restart Jetty (8.1.7) and then we can use the admin 
> system again via pc's. However, a colleague can get into admin on an 
> iPad with no trouble when no browser on a pc can!
>
> Anyone any ideas? It's really frustrating!
>
> Best regards,
>
> DQ
>
>

Re: SOLR 4.0 frequent admin problem

2013-07-04 Thread Roman Chyla

Yes :-)  see SOLR-118, seems an old issue...
On 4 Jul 2013 06:43, "David Quarterman"  wrote:

> Hi,
>
> About once a week the admin system comes up with SolrCore Initialization
> Failures. There's nothing in the logs and SOLR continues to work in the
> application it's supporting and in the 'direct access' mode (i.e.
> http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).
>
> The cure is to restart Jetty (8.1.7) and then we can use the admin system
> again via pc's. However, a colleague can get into admin on an iPad with no
> trouble when no browser on a pc can!
>
> Anyone any ideas? It's really frustrating!
>
> Best regards,
>
> DQ
>
>

SOLR 4.0 frequent admin problem

2013-07-04 Thread David Quarterman

Hi,

About once a week the admin system comes up with SolrCore Initialization 
Failures. There's nothing in the logs and SOLR continues to work in the 
application it's supporting and in the 'direct access' mode (i.e. 
http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).

The cure is to restart Jetty (8.1.7) and then we can use the admin system again 
via pc's. However, a colleague can get into admin on an iPad with no trouble 
when no browser on a pc can!

Anyone any ideas? It's really frustrating!

Best regards,

DQ

Re: Why solr 4.0 use FSIndexOutput to write file, otherwise MMap/NIO

2013-06-28 Thread Michael McCandless

Output is quite a bit simpler than input because all we do is write a
single stream of bytes with no seeking ("append only"), and it's done
with only one thread, so I don't think there'd be much to gain by
using the newer IO APIs for writing...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jun 28, 2013 at 2:23 AM, Jeffery Wang
 wrote:
>
> I have checked the FSDirectory, it will create "MMapDirectory" or 
> "NIOFSDirectory" for Directory.
> This two directory only supply IndexInput extend for read file 
> (MMapIndexInput extends ByteBufferIndexInput),
> why not there is not MMap/NIO IndexOutput extend for file write. It only use 
> FSIndexOutput for file write(FSIndexOutput extends BufferedIndexOutput).
>
> Does FSIndexOutput wirte file very slow than MMap/NIO? How to improve the IO 
> write performance.
>
> Thanks,
> __
> Jeffery Wang
> Application Service - Backend
> Morningstar (Shenzhen) Ltd.
> Morningstar. Illuminating investing worldwide.
> +86 755 3311 0220 Office
> +86 130 7782 2813 Mobile
> jeffery.w...@morningstar.com
> This e-mail contains privileged and confidential information and is intended 
> only for the use of the person(s) named above. Any dissemination, 
> distribution or duplication of this communication without prior written 
> consent from Morningstar is strictly prohibited. If you received this message 
> in error please contact the sender immediately and delete the materials from 
> any computer.
>

Why solr 4.0 use FSIndexOutput to write file, otherwise MMap/NIO

2013-06-27 Thread Jeffery Wang


I have checked the FSDirectory, it will create "MMapDirectory" or 
"NIOFSDirectory" for Directory.
This two directory only supply IndexInput extend for read file (MMapIndexInput 
extends ByteBufferIndexInput),
why not there is not MMap/NIO IndexOutput extend for file write. It only use 
FSIndexOutput for file write(FSIndexOutput extends BufferedIndexOutput).

Does FSIndexOutput wirte file very slow than MMap/NIO? How to improve the IO 
write performance.

Thanks,
__
Jeffery Wang
Application Service - Backend
Morningstar (Shenzhen) Ltd.
Morningstar. Illuminating investing worldwide.
+86 755 3311 0220 Office
+86 130 7782 2813 Mobile
jeffery.w...@morningstar.com
This e-mail contains privileged and confidential information and is intended 
only for the use of the person(s) named above. Any dissemination, distribution 
or duplication of this communication without prior written consent from 
Morningstar is strictly prohibited. If you received this message in error 
please contact the sender immediately and delete the materials from any 
computer.

RE: Solr 4.0 Optimize query very slow before the optimize end of a few minutes

2013-06-14 Thread Jeffery Wang

Yes, I used the same query url for each curl-call, it is very simple 
"http://...q=OS01W:sina*&fl=SecId,OS01W&rows=1&wt=xml&indent=true";.

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: 2013年6月14日 16:20
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.0 Optimize query very slow before the optimize end of a few 
minutes

On Fri, 2013-06-14 at 06:59 +0200, Jeffery Wang wrote:
> Time  queryTime(ms),  CPU %   r/s   w/s   rMB/s   wMB/s   IO %
> ...
> 7:30:52   16594   26  36  0   0.140   99.3
> 7:30:53   31  80  368 0   42.43   0   94.3
> 7:31:23   28575   41  35  21  0.372.3695.9   
> 7:32:22   53399   31  81  39  0.742.6399.5!!!
> 7:32:23   11  54  155 0   16.46   0   99.6
> 7:33:28   60199   28  30  2   0.120.0199.8!!

Having a single query that is slow is expected behaviour as the reader will 
have opened the merged segment and caches needs to be filled. But I do not know 
why you have more than one query that is slow. Do you use the same query for 
each curl-call?

- Toke Eskildsen, State and University Library, Denmark

RE: Solr 4.0 Optimize query very slow before the optimize end of a few minutes

2013-06-14 Thread Toke Eskildsen

On Fri, 2013-06-14 at 06:59 +0200, Jeffery Wang wrote:
> Time  queryTime(ms),  CPU %   r/s   w/s   rMB/s   wMB/s   IO %
> ...
> 7:30:52   16594   26  36  0   0.140   99.3
> 7:30:53   31  80  368 0   42.43   0   94.3
> 7:31:23   28575   41  35  21  0.372.3695.9   
> 7:32:22   53399   31  81  39  0.742.6399.5!!!
> 7:32:23   11  54  155 0   16.46   0   99.6
> 7:33:28   60199   28  30  2   0.120.0199.8!!

Having a single query that is slow is expected behaviour as the reader
will have opened the merged segment and caches needs to be filled. But I
do not know why you have more than one query that is slow. Do you use
the same query for each curl-call?

- Toke Eskildsen, State and University Library, Denmark

RE: Solr 4.0 Optimize query very slow before the optimize end of a few minutes

2013-06-13 Thread Jeffery Wang

Hi Otis,

Sorry, it does not formatted. 

TimequeryTime(ms),  CPU %   r/s   w/s   rMB/s   wMB/s   IO %
...
7:30:24 12  89  156.44  0   16.40   94.06
7:30:25 18  91  157 0   15.35   0   98.1
7:30:26 9   91  194 0   19.62   0   96.1
7:30:27 14  38  352 0   38.17   0   100.1
7:30:28 30  77  205.94  16.83   20.17   4.0298.51
7:30:30 101 88  396 0   45.99   0   90.7
7:30:31 11  90  120 0   11.34   0   97.5
7:30:32 38  89  262.38  0   28.03   0   96.24
7:30:33 11  78  68  17  4.894.9399.9
7:30:34 9   29  201 0   20.16   0   100.3
7:30:35 9   87  181 0   17.27   0   94.3
7:30:52 16594   26  36  0   0.140   99.3
7:30:53 31  80  368 0   42.43   0   94.3
7:31:23 28575   41  35  21  0.372.3695.9   
7:31:27 267660  127 0   13.76   0   83.5
7:31:28 8   59  279 0   30.99   0   99.4
7:32:22 53399   31  81  39  0.742.6399.5!!!
7:32:23 11  54  155 0   16.46   0   99.6
7:32:24 9   47  63.37   4.954.180.0298.42
7:32:25 9   25  34  0   0.130   98.8
7:32:26 8   27  30  0   0.120   99.9
7:33:28 60199   28  30  2   0.120.0199.8!!


But why it always query slow at the last few minutes. I have tested it many 
times the optimize will last for 2 hours , almost every time, the query is 
quick enough(query cost about 30ms) in the 2 hours, only slow at the last few 
minutes(query will cost 6ms). 

Thanks,
Jeffery
-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: 2013年6月14日 12:20
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.0 Optimize query very slow before the optimize end of a few 
minutes

Hi,

What you pasted from console didn't come across well.  Yes, optimizing a static 
index is OK and yes, if your index is "very unoptimized" then yes, it will be 
slower than when it is optimized not sure if that addresses your concerns...

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/





On Fri, Jun 14, 2013 at 12:04 AM, Jeffery Wang  
wrote:
> Do someone known Why the query is very slow before the optimize end of a few 
> minutes.
>
> When the solr optimize, I have a loop query( curl "query url" and sleep one 
> second) every one second to check the query speed. It is normal, the query 
> time can be accept. But it always very slow before the optimize end of a few 
> minutes.
> The solr index size is about 22G after optimized.
>
> The follows is the query time cost, CPU and IO usage. The whole optimize 
> process, the IO is high, it can be understand.
> time
>
> query time(ms)
>
> CPU %
>
> r/s
>
>   w/s
>
> rMB/s
>
> wMB/s
>
> IO %
>
> 7:30:24
>
> 12
>
> 89
>
> 156.44
>
> 0
>
> 16.4
>
> 0
>
> 94.06
>
> 7:30:25
>
> 18
>
> 91
>
> 157
>
> 0
>
> 15.35
>
> 0
>
> 98.1
>
> 7:30:26
>
> 9
>
> 91
>
> 194
>
> 0
>
> 19.62
>
> 0
>
> 96.1
>
> 7:30:27
>
> 14
>
> 38
>
> 352
>
> 0
>
> 38.17
>
> 0
>
> 100.1
>
> 7:30:28
>
> 30
>
> 77
>
> 205.94
>
> 16.83
>
> 20.17
>
> 4.02
>
> 98.51
>
> 7:30:30
>
> 101
>
> 88
>
> 396
>
> 0
>
> 45.99
>
> 0
>
> 90.7
>
> 7:30:31
>
> 11
>
> 90
>
> 120
>
> 0
>
> 11.34
>
> 0
>
> 97.5
>
> 7:30:32
>
> 38
>
> 89
>
> 262.38
>
> 0
>
> 28.03
>
> 0
>
> 96.24
>
> 7:30:33
>
> 11
>
> 78
>
> 68
>
> 17
>
> 4.89
>
> 4.93
>
> 99.9
>
> 7:30:34
>
> 9
>
> 29
>
> 201
>
> 0
>
> 20.16
>
> 0
>
> 100.3
>
> 7:30:35
>
> 9
>
> 87
>
> 181
>
> 0
>
> 17.27
>
> 0
>
> 94.3
>
> 7:30:52
>
> 16594
>
> 26
>
> 36
>
> 0
>
> 0.14
>
> 0
>
> 99.3
>
> 7:30:53
>
> 31
>
> 80
>
> 368
>
> 0
>
> 42.43
>
> 0
>
> 94.3
>
> 7:31:23
>
> 28575
>
> 41
>
> 35
>
> 21
>
> 0.37
>
> 2.36
>
> 95.9
>
> 7:31:27
>
> 2676
>
> 60
>
> 127
>
> 0
>
> 13.76
>
> 0
>
> 83.5
>
> 7:31:28
>
> 8
>
> 59
>
> 279
>
> 0
>
> 30.99
>
> 0
>
> 99.4
>
> 7:32:22
>
> 53399
>
> 31
>
> 81
>
> 39
>
> 0.74
>
> 2.63
>
> 99.5
>
> 7:32:23
>
> 11
>
> 54
>
> 155
>
> 0
>
> 16.46
>
> 0
>
> 99.6
>
> 7:32:24
>
> 9
>
> 47
>
> 63.37
>
> 4.95
>
> 4.18
>
> 0.02
>
> 98.42
>
> 7:32:25
>
> 9
>
> 25
>
> 34
>
> 0
>
> 0.13
>
> 0
>
> 98.8
>
> 7:32:26
>
> 8
>
> 27
>
> 30
>
> 0
>
> 0.12
>
> 0
>
> 99.9
>
> 7:33:28
>
> 60199
>
> 28
>
> 30
>
> 2
>
> 0.12
>
> 0.01
>
> 99.8
>
>
> Thanks,
> __
> 
> Jeffery Wang

Re: Solr 4.0 Optimize query very slow before the optimize end of a few minutes

2013-06-13 Thread Otis Gospodnetic

Hi,

What you pasted from console didn't come across well.  Yes, optimizing
a static index is OK and yes, if your index is "very unoptimized" then
yes, it will be slower than when it is optimized not sure if that
addresses your concerns...

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/





On Fri, Jun 14, 2013 at 12:04 AM, Jeffery Wang
 wrote:
> Do someone known Why the query is very slow before the optimize end of a few 
> minutes.
>
> When the solr optimize, I have a loop query( curl "query url" and sleep one 
> second) every one second to check the query speed. It is normal, the query 
> time can be accept. But it always very slow before the optimize end of a few 
> minutes.
> The solr index size is about 22G after optimized.
>
> The follows is the query time cost, CPU and IO usage. The whole optimize 
> process, the IO is high, it can be understand.
> time
>
> query time(ms)
>
> CPU %
>
> r/s
>
>   w/s
>
> rMB/s
>
> wMB/s
>
> IO %
>
> 7:30:24
>
> 12
>
> 89
>
> 156.44
>
> 0
>
> 16.4
>
> 0
>
> 94.06
>
> 7:30:25
>
> 18
>
> 91
>
> 157
>
> 0
>
> 15.35
>
> 0
>
> 98.1
>
> 7:30:26
>
> 9
>
> 91
>
> 194
>
> 0
>
> 19.62
>
> 0
>
> 96.1
>
> 7:30:27
>
> 14
>
> 38
>
> 352
>
> 0
>
> 38.17
>
> 0
>
> 100.1
>
> 7:30:28
>
> 30
>
> 77
>
> 205.94
>
> 16.83
>
> 20.17
>
> 4.02
>
> 98.51
>
> 7:30:30
>
> 101
>
> 88
>
> 396
>
> 0
>
> 45.99
>
> 0
>
> 90.7
>
> 7:30:31
>
> 11
>
> 90
>
> 120
>
> 0
>
> 11.34
>
> 0
>
> 97.5
>
> 7:30:32
>
> 38
>
> 89
>
> 262.38
>
> 0
>
> 28.03
>
> 0
>
> 96.24
>
> 7:30:33
>
> 11
>
> 78
>
> 68
>
> 17
>
> 4.89
>
> 4.93
>
> 99.9
>
> 7:30:34
>
> 9
>
> 29
>
> 201
>
> 0
>
> 20.16
>
> 0
>
> 100.3
>
> 7:30:35
>
> 9
>
> 87
>
> 181
>
> 0
>
> 17.27
>
> 0
>
> 94.3
>
> 7:30:52
>
> 16594
>
> 26
>
> 36
>
> 0
>
> 0.14
>
> 0
>
> 99.3
>
> 7:30:53
>
> 31
>
> 80
>
> 368
>
> 0
>
> 42.43
>
> 0
>
> 94.3
>
> 7:31:23
>
> 28575
>
> 41
>
> 35
>
> 21
>
> 0.37
>
> 2.36
>
> 95.9
>
> 7:31:27
>
> 2676
>
> 60
>
> 127
>
> 0
>
> 13.76
>
> 0
>
> 83.5
>
> 7:31:28
>
> 8
>
> 59
>
> 279
>
> 0
>
> 30.99
>
> 0
>
> 99.4
>
> 7:32:22
>
> 53399
>
> 31
>
> 81
>
> 39
>
> 0.74
>
> 2.63
>
> 99.5
>
> 7:32:23
>
> 11
>
> 54
>
> 155
>
> 0
>
> 16.46
>
> 0
>
> 99.6
>
> 7:32:24
>
> 9
>
> 47
>
> 63.37
>
> 4.95
>
> 4.18
>
> 0.02
>
> 98.42
>
> 7:32:25
>
> 9
>
> 25
>
> 34
>
> 0
>
> 0.13
>
> 0
>
> 98.8
>
> 7:32:26
>
> 8
>
> 27
>
> 30
>
> 0
>
> 0.12
>
> 0
>
> 99.9
>
> 7:33:28
>
> 60199
>
> 28
>
> 30
>
> 2
>
> 0.12
>
> 0.01
>
> 99.8
>
>
> Thanks,
> __
> Jeffery Wang

Solr 4.0 Optimize query very slow before the optimize end of a few minutes

2013-06-13 Thread Jeffery Wang

Do someone known Why the query is very slow before the optimize end of a few 
minutes.

When the solr optimize, I have a loop query( curl "query url" and sleep one 
second) every one second to check the query speed. It is normal, the query time 
can be accept. But it always very slow before the optimize end of a few minutes.
The solr index size is about 22G after optimized.

The follows is the query time cost, CPU and IO usage. The whole optimize 
process, the IO is high, it can be understand.
time

query time(ms)

CPU %

r/s

  w/s

rMB/s

wMB/s

IO %

7:30:24

12

89

156.44

0

16.4

0

94.06

7:30:25

18

91

157

0

15.35

0

98.1

7:30:26

9

91

194

0

19.62

0

96.1

7:30:27

14

38

352

0

38.17

0

100.1

7:30:28

30

77

205.94

16.83

20.17

4.02

98.51

7:30:30

101

88

396

0

45.99

0

90.7

7:30:31

11

90

120

0

11.34

0

97.5

7:30:32

38

89

262.38

0

28.03

0

96.24

7:30:33

11

78

68

17

4.89

4.93

99.9

7:30:34

9

29

201

0

20.16

0

100.3

7:30:35

9

87

181

0

17.27

0

94.3

7:30:52

16594

26

36

0

0.14

0

99.3

7:30:53

31

80

368

0

42.43

0

94.3

7:31:23

28575

41

35

21

0.37

2.36

95.9

7:31:27

2676

60

127

0

13.76

0

83.5

7:31:28

8

59

279

0

30.99

0

99.4

7:32:22

53399

31

81

39

0.74

2.63

99.5

7:32:23

11

54

155

0

16.46

0

99.6

7:32:24

9

47

63.37

4.95

4.18

0.02

98.42

7:32:25

9

25

34

0

0.13

0

98.8

7:32:26

8

27

30

0

0.12

0

99.9

7:33:28

60199

28

30

2

0.12

0.01

99.8


Thanks,
__
Jeffery Wang

Re: Advice on Solr 4.0 index backups

2013-06-11 Thread Otis Gospodnetic

This sounds pretty complete to me.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 11, 2013 4:21 AM, "Cosimo Streppone"  wrote:

> Hi,
>
> I'd like your advice on this backup plan.
> It's my first Solr deployment (4.0).
>
> Production consists of 1 master and n frontend slaves
> placed in different datacenters, replicating through HTTP.
> Only master is backed up. Frontend slaves can die anytime
> or go stale for a while and that's ok.
>
> Backup is performed daily. Steps are:
>
> 1) Ping the /replication handler with command=backup and numberToKeep=3
>and verify that we get a status=0
>
> 2) Check the replication handler with command=details and verify that
>we get a "snapshotCompletedAt". If not, spin and wait for it.
>
> 3) Snapshot is completed. Rsync --delete everything to a different
>volume on the same host. This is to keep a complete archived *local*
>copy should the index SSD drive fail.
>
> 4) Once the rsync is finished, a stand by machine downloads the archived
>copy from the master, and rebuilds everything under a "restore" core.
>
> 5) New "restore" core is started up with /admin/cores handler
>(command=CREATE IIRC)
>
> 6) Nagios checks that we can query the restore core correctly
>and get back at least a document from it.
>
> In this way, I get:
> - 3 (n) quick snapshots done by Solr itself. Older ones are discarded
>   automatically
> - 1 full index copy on a secondary volume
> - 1 "offsite" copy on another machine
> - a daily automated restore that verifies that our backup is valid
>
> It's been running reliably for a week or so now,
> but surely someone out there must have done this before
>
> Did I miss something?
>
> --
> Cosimo
>

Advice on Solr 4.0 index backups

2013-06-11 Thread Cosimo Streppone

Hi,

I'd like your advice on this backup plan.
It's my first Solr deployment (4.0).

Production consists of 1 master and n frontend slaves
placed in different datacenters, replicating through HTTP.
Only master is backed up. Frontend slaves can die anytime
or go stale for a while and that's ok.

Backup is performed daily. Steps are:

1) Ping the /replication handler with command=backup and numberToKeep=3
   and verify that we get a status=0

2) Check the replication handler with command=details and verify that
   we get a "snapshotCompletedAt". If not, spin and wait for it.

3) Snapshot is completed. Rsync --delete everything to a different
   volume on the same host. This is to keep a complete archived *local*
   copy should the index SSD drive fail.

4) Once the rsync is finished, a stand by machine downloads the archived
   copy from the master, and rebuilds everything under a "restore" core.

5) New "restore" core is started up with /admin/cores handler
   (command=CREATE IIRC)

6) Nagios checks that we can query the restore core correctly
   and get back at least a document from it.

In this way, I get:
- 3 (n) quick snapshots done by Solr itself. Older ones are discarded
  automatically
- 1 full index copy on a secondary volume
- 1 "offsite" copy on another machine
- a daily automated restore that verifies that our backup is valid

It's been running reliably for a week or so now,
but surely someone out there must have done this before

Did I miss something?

-- 
Cosimo

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-22 Thread Sandeep Mestry

Thanks Erick for your suggestion.

Turns out I won't be going that route after all as the highlighter
component is quite complicated - to follow and to override - and not much
time left in hand so did it the manual (dirty) way.

Beat Regards,
Sandeep


On 22 May 2013 12:21, Erick Erickson  wrote:

> Sandeep:
>
> You need to be a little careful here, I second Shawn's comment that
> you are mixing versions. You say you are using solr 4.0. But the jar
> that ships with that is apache-solr-core-4.0.0.jar. Then you talk
> about using solr-core, which is called solr-core-4.1.jar.
>
> Maven is not officially supported, so grabbing some solr-core.jar
> (with no apache) and doing _anything_ with it from a 4.0 code base is
> not a good idea.
>
> You can check out the 4.0 code branch and just compile the whole
> thing. Or you can get a new 4.0 distro and use the jars there. But I'd
> be _really_ cautious about using a 4.1 or later jar with 4.0.
>
> FWIW,
> Erick
>
> On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry 
> wrote:
> > Thanks Steve,
> >
> > I could find solr-core.jar in the repo but could not find
> > apache-solr-core.jar.
> > I think my issue got misunderstood - which is totally my fault.
> >
> > Anyway, I took into account Shawn's comment and will use solr-core.jar
> only
> > for compiling the project - not for deploying.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 21 May 2013 16:46, Steve Rowe  wrote:
> >
> >> The 4.0 solr-core jar is available in Maven Central: <
> >>
> http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
> >> >
> >>
> >> Steve
> >>
> >> On May 21, 2013, at 11:26 AM, Sandeep Mestry 
> wrote:
> >>
> >> > Hi Steve,
> >> >
> >> > Solr 4.0 - mentioned in the subject.. :-)
> >> >
> >> > Thanks,
> >> > Sandeep
> >> >
> >> >
> >> > On 21 May 2013 14:58, Steve Rowe  wrote:
> >> >
> >> >> Sandeep,
> >> >>
> >> >> What version of Solr are you using?
> >> >>
> >> >> Steve
> >> >>
> >> >> On May 21, 2013, at 6:55 AM, Sandeep Mestry 
> >> wrote:
> >> >>
> >> >>> Hi Shawn,
> >> >>>
> >> >>> Thanks for your reply.
> >> >>>
> >> >>> I'm not mixing versions.
> >> >>> The problem I faced is I want to override Highlighter from solr-core
> >> jar
> >> >>> and if I add that as a dependency in my project then there was a
> clash
> >> >>> between solr-core.jar and the apache-solr-core.jar that comes
> bundled
> >> >>> within the solr distribution. It was complaining about
> >> >> MorfologikFilterFactory
> >> >>> classcastexception.
> >> >>> I can't use apache-solr-core.jar as a dependency as no such jar
> exists
> >> in
> >> >>> any maven repo.
> >> >>>
> >> >>> The only thing I could do is to remove apache-solr-core.jar from
> >> solr.war
> >> >>> and then use solr-core.jar as a dependency - however I do not think
> >> this
> >> >> is
> >> >>> the ideal solution.
> >> >>>
> >> >>> Thanks,
> >> >>> Sandeep
> >> >>>
> >> >>>
> >> >>> On 20 May 2013 15:18, Shawn Heisey  wrote:
> >> >>>
> >> >>>> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> >> >>>>> And I do remember the discussion on the forum about dropping the
> name
> >> >>>>> *apache* from solr jars. If that's what caused this issue, then
> can
> >> you
> >> >>>>> tell me if the mirrors need updating with solr-core.jar instead of
> >> >>>>> apache-solr-core.jar?
> >> >>>>
> >> >>>> If it's named apache-solr-core, then it's from 4.0 or earlier.  If
> >> it's
> >> >>>> named solr-core, then it's from 4.1 or later.  That might mean that
> >> you
> >> >>>> are mixing versions - don't do that.  Make sure that you have jars
> >> from
> >> >>>> the exact same version as your server.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Shawn
> >> >>>>
> >> >>>>
> >> >>
> >> >>
> >>
> >>
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-22 Thread Erick Erickson

Sandeep:

You need to be a little careful here, I second Shawn's comment that
you are mixing versions. You say you are using solr 4.0. But the jar
that ships with that is apache-solr-core-4.0.0.jar. Then you talk
about using solr-core, which is called solr-core-4.1.jar.

Maven is not officially supported, so grabbing some solr-core.jar
(with no apache) and doing _anything_ with it from a 4.0 code base is
not a good idea.

You can check out the 4.0 code branch and just compile the whole
thing. Or you can get a new 4.0 distro and use the jars there. But I'd
be _really_ cautious about using a 4.1 or later jar with 4.0.

FWIW,
Erick

On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry  wrote:
> Thanks Steve,
>
> I could find solr-core.jar in the repo but could not find
> apache-solr-core.jar.
> I think my issue got misunderstood - which is totally my fault.
>
> Anyway, I took into account Shawn's comment and will use solr-core.jar only
> for compiling the project - not for deploying.
>
> Thanks,
> Sandeep
>
>
> On 21 May 2013 16:46, Steve Rowe  wrote:
>
>> The 4.0 solr-core jar is available in Maven Central: <
>> http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
>> >
>>
>> Steve
>>
>> On May 21, 2013, at 11:26 AM, Sandeep Mestry  wrote:
>>
>> > Hi Steve,
>> >
>> > Solr 4.0 - mentioned in the subject.. :-)
>> >
>> > Thanks,
>> > Sandeep
>> >
>> >
>> > On 21 May 2013 14:58, Steve Rowe  wrote:
>> >
>> >> Sandeep,
>> >>
>> >> What version of Solr are you using?
>> >>
>> >> Steve
>> >>
>> >> On May 21, 2013, at 6:55 AM, Sandeep Mestry 
>> wrote:
>> >>
>> >>> Hi Shawn,
>> >>>
>> >>> Thanks for your reply.
>> >>>
>> >>> I'm not mixing versions.
>> >>> The problem I faced is I want to override Highlighter from solr-core
>> jar
>> >>> and if I add that as a dependency in my project then there was a clash
>> >>> between solr-core.jar and the apache-solr-core.jar that comes bundled
>> >>> within the solr distribution. It was complaining about
>> >> MorfologikFilterFactory
>> >>> classcastexception.
>> >>> I can't use apache-solr-core.jar as a dependency as no such jar exists
>> in
>> >>> any maven repo.
>> >>>
>> >>> The only thing I could do is to remove apache-solr-core.jar from
>> solr.war
>> >>> and then use solr-core.jar as a dependency - however I do not think
>> this
>> >> is
>> >>> the ideal solution.
>> >>>
>> >>> Thanks,
>> >>> Sandeep
>> >>>
>> >>>
>> >>> On 20 May 2013 15:18, Shawn Heisey  wrote:
>> >>>
>> >>>> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
>> >>>>> And I do remember the discussion on the forum about dropping the name
>> >>>>> *apache* from solr jars. If that's what caused this issue, then can
>> you
>> >>>>> tell me if the mirrors need updating with solr-core.jar instead of
>> >>>>> apache-solr-core.jar?
>> >>>>
>> >>>> If it's named apache-solr-core, then it's from 4.0 or earlier.  If
>> it's
>> >>>> named solr-core, then it's from 4.1 or later.  That might mean that
>> you
>> >>>> are mixing versions - don't do that.  Make sure that you have jars
>> from
>> >>>> the exact same version as your server.
>> >>>>
>> >>>> Thanks,
>> >>>> Shawn
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Thanks Steve,

I could find solr-core.jar in the repo but could not find
apache-solr-core.jar.
I think my issue got misunderstood - which is totally my fault.

Anyway, I took into account Shawn's comment and will use solr-core.jar only
for compiling the project - not for deploying.

Thanks,
Sandeep


On 21 May 2013 16:46, Steve Rowe  wrote:

> The 4.0 solr-core jar is available in Maven Central: <
> http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
> >
>
> Steve
>
> On May 21, 2013, at 11:26 AM, Sandeep Mestry  wrote:
>
> > Hi Steve,
> >
> > Solr 4.0 - mentioned in the subject.. :-)
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 21 May 2013 14:58, Steve Rowe  wrote:
> >
> >> Sandeep,
> >>
> >> What version of Solr are you using?
> >>
> >> Steve
> >>
> >> On May 21, 2013, at 6:55 AM, Sandeep Mestry 
> wrote:
> >>
> >>> Hi Shawn,
> >>>
> >>> Thanks for your reply.
> >>>
> >>> I'm not mixing versions.
> >>> The problem I faced is I want to override Highlighter from solr-core
> jar
> >>> and if I add that as a dependency in my project then there was a clash
> >>> between solr-core.jar and the apache-solr-core.jar that comes bundled
> >>> within the solr distribution. It was complaining about
> >> MorfologikFilterFactory
> >>> classcastexception.
> >>> I can't use apache-solr-core.jar as a dependency as no such jar exists
> in
> >>> any maven repo.
> >>>
> >>> The only thing I could do is to remove apache-solr-core.jar from
> solr.war
> >>> and then use solr-core.jar as a dependency - however I do not think
> this
> >> is
> >>> the ideal solution.
> >>>
> >>> Thanks,
> >>> Sandeep
> >>>
> >>>
> >>> On 20 May 2013 15:18, Shawn Heisey  wrote:
> >>>
> >>>> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> >>>>> And I do remember the discussion on the forum about dropping the name
> >>>>> *apache* from solr jars. If that's what caused this issue, then can
> you
> >>>>> tell me if the mirrors need updating with solr-core.jar instead of
> >>>>> apache-solr-core.jar?
> >>>>
> >>>> If it's named apache-solr-core, then it's from 4.0 or earlier.  If
> it's
> >>>> named solr-core, then it's from 4.1 or later.  That might mean that
> you
> >>>> are mixing versions - don't do that.  Make sure that you have jars
> from
> >>>> the exact same version as your server.
> >>>>
> >>>> Thanks,
> >>>> Shawn
> >>>>
> >>>>
> >>
> >>
>
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Steve Rowe

The 4.0 solr-core jar is available in Maven Central: 
<http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar>

Steve

On May 21, 2013, at 11:26 AM, Sandeep Mestry  wrote:

> Hi Steve,
> 
> Solr 4.0 - mentioned in the subject.. :-)
> 
> Thanks,
> Sandeep
> 
> 
> On 21 May 2013 14:58, Steve Rowe  wrote:
> 
>> Sandeep,
>> 
>> What version of Solr are you using?
>> 
>> Steve
>> 
>> On May 21, 2013, at 6:55 AM, Sandeep Mestry  wrote:
>> 
>>> Hi Shawn,
>>> 
>>> Thanks for your reply.
>>> 
>>> I'm not mixing versions.
>>> The problem I faced is I want to override Highlighter from solr-core jar
>>> and if I add that as a dependency in my project then there was a clash
>>> between solr-core.jar and the apache-solr-core.jar that comes bundled
>>> within the solr distribution. It was complaining about
>> MorfologikFilterFactory
>>> classcastexception.
>>> I can't use apache-solr-core.jar as a dependency as no such jar exists in
>>> any maven repo.
>>> 
>>> The only thing I could do is to remove apache-solr-core.jar from solr.war
>>> and then use solr-core.jar as a dependency - however I do not think this
>> is
>>> the ideal solution.
>>> 
>>> Thanks,
>>> Sandeep
>>> 
>>> 
>>> On 20 May 2013 15:18, Shawn Heisey  wrote:
>>> 
>>>> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
>>>>> And I do remember the discussion on the forum about dropping the name
>>>>> *apache* from solr jars. If that's what caused this issue, then can you
>>>>> tell me if the mirrors need updating with solr-core.jar instead of
>>>>> apache-solr-core.jar?
>>>> 
>>>> If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
>>>> named solr-core, then it's from 4.1 or later.  That might mean that you
>>>> are mixing versions - don't do that.  Make sure that you have jars from
>>>> the exact same version as your server.
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>>>> 
>> 
>>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Hi Steve,

Solr 4.0 - mentioned in the subject.. :-)

Thanks,
Sandeep


On 21 May 2013 14:58, Steve Rowe  wrote:

> Sandeep,
>
> What version of Solr are you using?
>
> Steve
>
> On May 21, 2013, at 6:55 AM, Sandeep Mestry  wrote:
>
> > Hi Shawn,
> >
> > Thanks for your reply.
> >
> > I'm not mixing versions.
> > The problem I faced is I want to override Highlighter from solr-core jar
> > and if I add that as a dependency in my project then there was a clash
> > between solr-core.jar and the apache-solr-core.jar that comes bundled
> > within the solr distribution. It was complaining about
> MorfologikFilterFactory
> > classcastexception.
> > I can't use apache-solr-core.jar as a dependency as no such jar exists in
> > any maven repo.
> >
> > The only thing I could do is to remove apache-solr-core.jar from solr.war
> > and then use solr-core.jar as a dependency - however I do not think this
> is
> > the ideal solution.
> >
> > Thanks,
> > Sandeep
> >
> >
> > On 20 May 2013 15:18, Shawn Heisey  wrote:
> >
> >> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> >>> And I do remember the discussion on the forum about dropping the name
> >>> *apache* from solr jars. If that's what caused this issue, then can you
> >>> tell me if the mirrors need updating with solr-core.jar instead of
> >>> apache-solr-core.jar?
> >>
> >> If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
> >> named solr-core, then it's from 4.1 or later.  That might mean that you
> >> are mixing versions - don't do that.  Make sure that you have jars from
> >> the exact same version as your server.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Steve Rowe

Sandeep,

What version of Solr are you using?

Steve

On May 21, 2013, at 6:55 AM, Sandeep Mestry  wrote:

> Hi Shawn,
> 
> Thanks for your reply.
> 
> I'm not mixing versions.
> The problem I faced is I want to override Highlighter from solr-core jar
> and if I add that as a dependency in my project then there was a clash
> between solr-core.jar and the apache-solr-core.jar that comes bundled
> within the solr distribution. It was complaining about MorfologikFilterFactory
> classcastexception.
> I can't use apache-solr-core.jar as a dependency as no such jar exists in
> any maven repo.
> 
> The only thing I could do is to remove apache-solr-core.jar from solr.war
> and then use solr-core.jar as a dependency - however I do not think this is
> the ideal solution.
> 
> Thanks,
> Sandeep
> 
> 
> On 20 May 2013 15:18, Shawn Heisey  wrote:
> 
>> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
>>> And I do remember the discussion on the forum about dropping the name
>>> *apache* from solr jars. If that's what caused this issue, then can you
>>> tell me if the mirrors need updating with solr-core.jar instead of
>>> apache-solr-core.jar?
>> 
>> If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
>> named solr-core, then it's from 4.1 or later.  That might mean that you
>> are mixing versions - don't do that.  Make sure that you have jars from
>> the exact same version as your server.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Shawn Heisey

On 5/21/2013 4:55 AM, Sandeep Mestry wrote:
> I'm not mixing versions.
> The problem I faced is I want to override Highlighter from solr-core jar
> and if I add that as a dependency in my project then there was a clash
> between solr-core.jar and the apache-solr-core.jar that comes bundled
> within the solr distribution. It was complaining about MorfologikFilterFactory
> classcastexception.
> I can't use apache-solr-core.jar as a dependency as no such jar exists in
> any maven repo.
> 
> The only thing I could do is to remove apache-solr-core.jar from solr.war
> and then use solr-core.jar as a dependency - however I do not think this is
> the ideal solution.

You'll need to have the solr core jar available for *compiling* your
code, but when you actually go to use your code with Solr, you don't
need to include the core jar, because it's already in Solr.

Thanks,
Shawn

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-21 Thread Sandeep Mestry

Hi Shawn,

Thanks for your reply.

I'm not mixing versions.
The problem I faced is I want to override Highlighter from solr-core jar
and if I add that as a dependency in my project then there was a clash
between solr-core.jar and the apache-solr-core.jar that comes bundled
within the solr distribution. It was complaining about MorfologikFilterFactory
classcastexception.
I can't use apache-solr-core.jar as a dependency as no such jar exists in
any maven repo.

The only thing I could do is to remove apache-solr-core.jar from solr.war
and then use solr-core.jar as a dependency - however I do not think this is
the ideal solution.

Thanks,
Sandeep

On 20 May 2013 15:18, Shawn Heisey  wrote:

> On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> > And I do remember the discussion on the forum about dropping the name
> > *apache* from solr jars. If that's what caused this issue, then can you
> > tell me if the mirrors need updating with solr-core.jar instead of
> > apache-solr-core.jar?
>
> If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
> named solr-core, then it's from 4.1 or later.  That might mean that you
> are mixing versions - don't do that.  Make sure that you have jars from
> the exact same version as your server.
>
> Thanks,
> Shawn
>
>

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-20 Thread Shawn Heisey

On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
> And I do remember the discussion on the forum about dropping the name
> *apache* from solr jars. If that's what caused this issue, then can you
> tell me if the mirrors need updating with solr-core.jar instead of
> apache-solr-core.jar?

If it's named apache-solr-core, then it's from 4.0 or earlier.  If it's
named solr-core, then it's from 4.1 or later.  That might mean that you
are mixing versions - don't do that.  Make sure that you have jars from
the exact same version as your server.

Thanks,
Shawn

Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-20 Thread Sandeep Mestry

Hi All,

I want to override a component from solr-core and for that I need solr-core
jar.

I am using the solr.war that comes from Apache mirror and if I open the
war, I see the solr-core jar is actually named as apache-solr-core.jar.
This is also true about solrj jar.

If I now provide a dependency in my module for apache-solr-core.jar, it's
not being found in the mirror. And if I use solr-core.jar, I get strange
class cast exception during Solr startup for MorfologikFilterFactory.

(I'm not using this factory at all in my project.)

at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.ClassCastException: class
org.apache.lucene.analysis.morfologik.MorfologikFilterFactory
at java.lang.Class.asSubclass(Unknown Source)
at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:126)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:73)
at
org.apache.lucene.analysis.util.AnalysisSPILoader.(AnalysisSPILoader.java:55)

I tried manually removing the apache-solr-core.jar from the solr
distribution war and then providing the dependency and everything worked
fine.

And I do remember the discussion on the forum about dropping the name
*apache* from solr jars. If that's what caused this issue, then can you
tell me if the mirrors need updating with solr-core.jar instead of
apache-solr-core.jar?

Many Thanks,
Sandeep

Re: Question about Edismax - Solr 4.0

2013-05-17 Thread Sandeep Mestry

Hello Jack,

Thanks for pointing the issues out and for your valuable suggestion. My
preliminary tests were okay on search but I will be doing more testing to
see if this has impacted any other searches.

Thanks once again and have a nice sunny weekend,
Sandeep


On 17 May 2013 05:35, Jack Krupansky  wrote:

> Ah... I think your issue is the preserveOriginal=1 on the query analyzer
> as well as the fact that you have all of these catenatexx="1" options on
> the query analyzer - I indicated that you should remove them all.
>
> The problem is that the whitespace analyzer leaves the leading comma in
> place, and the preserveOriginal="1" also generates an extra token for the
> term, with the comma in place . But, with the space, the comma and "10" are
> separate terms and get analyzed independently.
>
> The query results probably indicate that you don't have that exact
> combination of the term and leading punctuation - or that there is no
> standalone comma in your input data.
>
> Try the following replacement for the query-time WDF:
>
>
>  stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="0" />
>
>
> -- Jack Krupansky
>
> -Original Message- From: Sandeep Mestry
> Sent: Thursday, May 16, 2013 5:50 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Question about Edismax - Solr 4.0
>
> Hi Jack,
>
> Thanks for your response again and for helping me out to get through this.
>
> The URL is definitely encoded for spaces and it looks like below. As I
> mentioned in my previous mail, I can't add it to query parameter as that
> searches on multiple fields.
>
> The title field is defined as below:
>  multiValued="true"/>
>
> q=countryside&rows=20&qt=**assdismax&fq=%28title%3A%28,**
> 10%29%29&fq=collection:assets
>
> 
> 
> edismax
> explicit
> 0.01
> title^10 description^5 annotations^3 notes^2
> categories
> title
> 0
> *:*
> *,score
> 100%
> AND
> score desc
> true
> -1
> 1
> uniq_**subtype_id
> component_**type
> genre_type<**/str>
> 
> 
> collection:assets
> 
> 
>
> The term 'countryside' needs to be searched against multiple fields
> including titles, descriptions, annotations, categories, notes but the UI
> also has a feature to limit results by providing a title field.
>
>
> I can see that the filter queries are always parsed by LuceneQueryParser
> however I'd expect it to generate the parsed_filter_queries debug output in
> every situation.
>
> I have tried it as the main query with both edismax and lucene defType and
> it gives me correct output and correct results.
> But, there is some problem when this is used as a filter query as the the
> parser is not able to parse a comma with a space.
>
> Thanks again Jack, please let me know in case you need more inputs from my
> side.
>
> Best Regards,
> Sandeep
>
> On 16 May 2013 18:03, Jack Krupansky  wrote:
>
>  Could you show us the full query URL - spaces must be encoded in URL query
>> parameters.
>>
>> Also show the actual field XML - you omitted that.
>>
>> Try the same query as a main query, using both defType=edismax and
>> defType=lucene.
>>
>> Note that the filter query is parsed using the Lucene query parser, not
>> edismax, independent of the defType parameter. But you don't have any
>> edismax features in your fq anyway.
>>
>> But you can stick {!edismax} in front of the query to force edismax to be
>> used for the fq, although it really shouldn't change anything:
>>
>> Also, catenate is fine for indexing, but will mess up your queries at
>> query time, so set them to "0" in the query analyzer
>>
>> Also, make sure you have autoGeneratePhraseQueries="true" on the
>> field
>>
>> type, but that's not the issue here.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Sandeep Mestry
>> Sent: Thursday, May 16, 2013 12:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Question about Edismax - Solr 4.0
>>
>>
>> Thanks Jack for your reply..
>>
>> The problem is, I'm finding results for fq=title:(,10) but not for
>> fq=title:(, 10) - apologies if that was not clear from my first mail.
>> I have already mentioned the debug analysis in my previous mail.
>>
>> Additionally,

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Jack Krupansky

Ah... I think your issue is the preserveOriginal=1 on the query analyzer as 
well as the fact that you have all of these catenatexx="1" options on the 
query analyzer - I indicated that you should remove them all.


The problem is that the whitespace analyzer leaves the leading comma in 
place, and the preserveOriginal="1" also generates an extra token for the 
term, with the comma in place . But, with the space, the comma and "10" are 
separate terms and get analyzed independently.


The query results probably indicate that you don't have that exact 
combination of the term and leading punctuation - or that there is no 
standalone comma in your input data.


Try the following replacement for the query-time WDF:

generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" 
splitOnNumerics="0" preserveOriginal="0" />


-- Jack Krupansky

-Original Message- 
From: Sandeep Mestry

Sent: Thursday, May 16, 2013 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about Edismax - Solr 4.0

Hi Jack,

Thanks for your response again and for helping me out to get through this.

The URL is definitely encoded for spaces and it looks like below. As I
mentioned in my previous mail, I can't add it to query parameter as that
searches on multiple fields.

The title field is defined as below:


q=countryside&rows=20&qt=assdismax&fq=%28title%3A%28,10%29%29&fq=collection:assets



edismax
explicit
0.01
title^10 description^5 annotations^3 notes^2 categories
title
0
*:*
*,score
100%
AND
score desc
true
-1
1
uniq_subtype_id
component_type
genre_type


collection:assets



The term 'countryside' needs to be searched against multiple fields
including titles, descriptions, annotations, categories, notes but the UI
also has a feature to limit results by providing a title field.


I can see that the filter queries are always parsed by LuceneQueryParser
however I'd expect it to generate the parsed_filter_queries debug output in
every situation.

I have tried it as the main query with both edismax and lucene defType and
it gives me correct output and correct results.
But, there is some problem when this is used as a filter query as the the
parser is not able to parse a comma with a space.

Thanks again Jack, please let me know in case you need more inputs from my
side.

Best Regards,
Sandeep

On 16 May 2013 18:03, Jack Krupansky  wrote:


Could you show us the full query URL - spaces must be encoded in URL query
parameters.

Also show the actual field XML - you omitted that.

Try the same query as a main query, using both defType=edismax and
defType=lucene.

Note that the filter query is parsed using the Lucene query parser, not
edismax, independent of the defType parameter. But you don't have any
edismax features in your fq anyway.

But you can stick {!edismax} in front of the query to force edismax to be
used for the fq, although it really shouldn't change anything:

Also, catenate is fine for indexing, but will mess up your queries at
query time, so set them to "0" in the query analyzer

Also, make sure you have autoGeneratePhraseQueries="**true" on the field
type, but that's not the issue here.


-- Jack Krupansky

-Original Message- From: Sandeep Mestry
Sent: Thursday, May 16, 2013 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about Edismax - Solr 4.0


Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:
positionIncrementGap="100"




 

   
   catenateWords="1" catenateNumbers="1" catenateAll="1" 
splitOnCaseChange="1"

splitOnNumerics="0" preserveOriginal="1" />
   
   
   
   
   catenateWords="1" catenateNumbers="1" catenateAll="1" 
splitOnCaseChange="1"

splitOnNumerics="0" preserveOriginal="1" />
   
   
   

I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?
- And why other filter queries did not show up (collection:assets) in 
debug

section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky  wrote:

 You haven't indicated any problem here! What is the symptom that you

actually think is a problem.

There is no comma operator in any of the Solr query parsers. Co

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Hi Jack,

Thanks for your response again and for helping me out to get through this.

The URL is definitely encoded for spaces and it looks like below. As I
mentioned in my previous mail, I can't add it to query parameter as that
searches on multiple fields.

The title field is defined as below:


q=countryside&rows=20&qt=assdismax&fq=%28title%3A%28,10%29%29&fq=collection:assets



edismax
explicit
0.01
title^10 description^5 annotations^3 notes^2 categories
title
0
*:*
*,score
100%
AND
score desc
true
-1
1
uniq_subtype_id
component_type
genre_type


collection:assets



The term 'countryside' needs to be searched against multiple fields
including titles, descriptions, annotations, categories, notes but the UI
also has a feature to limit results by providing a title field.


I can see that the filter queries are always parsed by LuceneQueryParser
however I'd expect it to generate the parsed_filter_queries debug output in
every situation.

I have tried it as the main query with both edismax and lucene defType and
it gives me correct output and correct results.
But, there is some problem when this is used as a filter query as the the
parser is not able to parse a comma with a space.

Thanks again Jack, please let me know in case you need more inputs from my
side.

Best Regards,
Sandeep

On 16 May 2013 18:03, Jack Krupansky  wrote:

> Could you show us the full query URL - spaces must be encoded in URL query
> parameters.
>
> Also show the actual field XML - you omitted that.
>
> Try the same query as a main query, using both defType=edismax and
> defType=lucene.
>
> Note that the filter query is parsed using the Lucene query parser, not
> edismax, independent of the defType parameter. But you don't have any
> edismax features in your fq anyway.
>
> But you can stick {!edismax} in front of the query to force edismax to be
> used for the fq, although it really shouldn't change anything:
>
> Also, catenate is fine for indexing, but will mess up your queries at
> query time, so set them to "0" in the query analyzer
>
> Also, make sure you have autoGeneratePhraseQueries="**true" on the field
> type, but that's not the issue here.
>
>
> -- Jack Krupansky
>
> -Original Message----- From: Sandeep Mestry
> Sent: Thursday, May 16, 2013 12:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Question about Edismax - Solr 4.0
>
>
> Thanks Jack for your reply..
>
> The problem is, I'm finding results for fq=title:(,10) but not for
> fq=title:(, 10) - apologies if that was not clear from my first mail.
> I have already mentioned the debug analysis in my previous mail.
>
> Additionally, the title field is defined as below:
> 
>>
>>  
>
> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
>
>
>
>
> stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> splitOnNumerics="0" preserveOriginal="1" />
>
>
>
>
> I have the set catenate options to 1 for all types.
> I can understand if ',' getting ignored when it is on its own (title:(,
> 10)) but
> - Why solr is not searching for 10 in that case just like it did when the
> query was (title:(,10))?
> - And why other filter queries did not show up (collection:assets) in debug
> section?
>
>
> Thanks,
> Sandeep
>
>
> On 16 May 2013 13:57, Jack Krupansky  wrote:
>
>  You haven't indicated any problem here! What is the symptom that you
>> actually think is a problem.
>>
>> There is no comma operator in any of the Solr query parsers. Comma is just
>> another character that may or may not be included or discarded depending
>> on
>> the specific field type and analyzer. For example, a white space analyzer
>> will keep commas, but the standard analyzer or the word delimiter filter
>> will discard them. If "title" were a "string" type, all punctuation would
>> be preserved, including commas and spaces (but spaces would need to be
>> escaped or the term text enclosed in parentheses.)
>>
>> Let us know what your symptom is though, first.
>>
>> I mean, the filter query looks perfectly reasonable from an abstr

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Jack Krupansky

Could you show us the full query URL - spaces must be encoded in URL query 
parameters.


Also show the actual field XML - you omitted that.

Try the same query as a main query, using both defType=edismax and 
defType=lucene.


Note that the filter query is parsed using the Lucene query parser, not 
edismax, independent of the defType parameter. But you don't have any 
edismax features in your fq anyway.


But you can stick {!edismax} in front of the query to force edismax to be 
used for the fq, although it really shouldn't change anything:


Also, catenate is fine for indexing, but will mess up your queries at query 
time, so set them to "0" in the query analyzer


Also, make sure you have autoGeneratePhraseQueries="true" on the field type, 
but that's not the issue here.


-- Jack Krupansky

-Original Message- 
From: Sandeep Mestry

Sent: Thursday, May 16, 2013 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about Edismax - Solr 4.0

Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:




   
   
   
   
   
   
   
   
   
   

I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?
- And why other filter queries did not show up (collection:assets) in debug
section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky  wrote:


You haven't indicated any problem here! What is the symptom that you
actually think is a problem.

There is no comma operator in any of the Solr query parsers. Comma is just
another character that may or may not be included or discarded depending 
on

the specific field type and analyzer. For example, a white space analyzer
will keep commas, but the standard analyzer or the word delimiter filter
will discard them. If "title" were a "string" type, all punctuation would
be preserved, including commas and spaces (but spaces would need to be
escaped or the term text enclosed in parentheses.)

Let us know what your symptom is though, first.

I mean, the filter query looks perfectly reasonable from an abstract
perspective.

-- Jack Krupansky

-Original Message- From: Sandeep Mestry
Sent: Thursday, May 16, 2013 6:51 AM
To: solr-user@lucene.apache.org
Subject: Question about Edismax - Solr 4.0

-- *Edismax and Filter Queries with Commas and spaces* --


Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,

(titles:(,10))
(collection:assets)

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 
1

above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:

 











I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?
- And why other filter queries did not show up (collection:assets) in debug
section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky  wrote:

> You haven't indicated any problem here! What is the symptom that you
> actually think is a problem.
>
> There is no comma operator in any of the Solr query parsers. Comma is just
> another character that may or may not be included or discarded depending on
> the specific field type and analyzer. For example, a white space analyzer
> will keep commas, but the standard analyzer or the word delimiter filter
> will discard them. If "title" were a "string" type, all punctuation would
> be preserved, including commas and spaces (but spaces would need to be
> escaped or the term text enclosed in parentheses.)
>
> Let us know what your symptom is though, first.
>
> I mean, the filter query looks perfectly reasonable from an abstract
> perspective.
>
> -- Jack Krupansky
>
> -Original Message- From: Sandeep Mestry
> Sent: Thursday, May 16, 2013 6:51 AM
> To: solr-user@lucene.apache.org
> Subject: Question about Edismax - Solr 4.0
>
> -- *Edismax and Filter Queries with Commas and spaces* --
>
>
> Dear Experts,
>
> This appears to be a bug, please suggest if I'm wrong.
>
> If I search with the following filter query,
>
> 1) fq=title:(, 10)
>
> - I get no results.
> - The debug output does NOT show the section containing
> parsed_filter_queries
>
> if I carry a search with the filter query,
>
> 2) fq=title:(,10) - (No space between , and 10)
>
> - I get results and the debug output shows the parsed filter queries
> section as,
> 
> (titles:(,10))
> (collection:assets)
>
> As you can see above, I'm also passing in other filter queries
> (collection:assets) which appear correctly but they do not appear in case 1
> above.
>
> I can't make this as part of the query parameter as that needs to be
> searched against multiple fields.
>
> Can someone suggest a fix in this case please. I'm using Solr 4.0.
>
> Many Thanks,
> Sandeep
>

Re: Question about Edismax - Solr 4.0

2013-05-16 Thread Jack Krupansky

You haven't indicated any problem here! What is the symptom that you 
actually think is a problem.


There is no comma operator in any of the Solr query parsers. Comma is just 
another character that may or may not be included or discarded depending on 
the specific field type and analyzer. For example, a white space analyzer 
will keep commas, but the standard analyzer or the word delimiter filter 
will discard them. If "title" were a "string" type, all punctuation would be 
preserved, including commas and spaces (but spaces would need to be escaped 
or the term text enclosed in parentheses.)


Let us know what your symptom is though, first.

I mean, the filter query looks perfectly reasonable from an abstract 
perspective.


-- Jack Krupansky

-Original Message- 
From: Sandeep Mestry

Sent: Thursday, May 16, 2013 6:51 AM
To: solr-user@lucene.apache.org
Subject: Question about Edismax - Solr 4.0

-- *Edismax and Filter Queries with Commas and spaces* --

Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,

(titles:(,10))
(collection:assets)

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 1
above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Question about Edismax - Solr 4.0

2013-05-16 Thread Sandeep Mestry

-- *Edismax and Filter Queries with Commas and spaces* --

Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,

(titles:(,10))
(collection:assets)

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case 1
above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Re: How to aggregate data in solr 4.0?

2013-05-15 Thread Jack Krupansky

The Solr "stats" search component does some basic aggregates: min, max, 
count, sum, average, mean, sum of squares, standard deviation:


http://wiki.apache.org/solr/StatsComponent

-- Jack Krupansky

-Original Message- 
From: eShard

Sent: Wednesday, May 15, 2013 2:45 PM
To: solr-user@lucene.apache.org
Subject: How to aggregate data in solr 4.0?

Good afternoon,
Does anyone know of a good tutorial on how to perform SQL like aggregation
in solr queries?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-aggregate-data-in-solr-4-0-tp4063584.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to aggregate data in solr 4.0?

2013-05-15 Thread eShard

Good afternoon,
Does anyone know of a good tutorial on how to perform SQL like aggregation
in solr queries?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-aggregate-data-in-solr-4-0-tp4063584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 4.0 - DIH delta-import scheduler

2013-03-19 Thread Stefan Matheis

Jegan

By DIH Scheduler you mean 
http://wiki.apache.org/solr/DataImportHandler#Scheduling ? If so, then it's not 
yet in. More details in the Ticket (which is as well linked from the 
Wiki-Page): https://issues.apache.org/jira/browse/SOLR-2305

Regarding your Question on the UI: the "Auto-Refresh" Checkbox only 
automatically updates the status you're seeing on the screen. So for example 
while you have running a import, you can activate this checkbox and relax in 
your chair while the screen updates itself every two seconds - but that does 
neither trigger a import itself nor does it (initiate) anything else beside 
fetching status information

HTH=
Stefan

On Tuesday, March 19, 2013 at 7:32 PM, Jegannathan Mehalingam wrote:

> Is the DIH Scheduler available in SOLR 4.1 or in 4.2? I would like to 
> know if we can schedule delta-import in SOLR 4.1 or 4.2.
> 
> In SOLR 4.1 DIH console, I see a "Refresh Status" button and 
> "Auto-Refresh Status" check box. Is this related to the delta-import 
> scheduling? I couldn't find any documentation about it. But it seems to 
> run something every 2 secs. If someone knows any documentation, can you 
> please put the link out.
> 
> Thanks,
> Jegan

SOLR 4.0 - DIH delta-import scheduler

2013-03-19 Thread Jegannathan Mehalingam

Is the DIH Scheduler available in SOLR 4.1 or in 4.2? I would like to 
know if we can schedule delta-import in SOLR 4.1 or 4.2.


In SOLR 4.1 DIH console, I see a "Refresh Status" button and 
"Auto-Refresh Status" check box. Is this related to the delta-import 
scheduling? I couldn't find any documentation about it. But it seems to 
run something every 2 secs. If someone knows any documentation, can you 
please put the link out.


Thanks,
Jegan

Re: Handling a closed IndexWriter in SOLR 4.0

2013-03-18 Thread Mark Miller

I'll fix it - I put up a patch last night.

- Mark

On Mar 18, 2013, at 1:12 AM, mark12345  wrote:

> This looks similar to the issue I also have:
> 
> *
> http://lucene.472066.n3.nabble.com/Solr-4-1-4-2-SolrException-Error-opening-new-searcher-td4046543.html
> 
>   
> *https://issues.apache.org/jira/browse/SOLR-4605
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Handling-a-closed-IndexWriter-in-SOLR-4-0-tp4047392p4048421.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Handling a closed IndexWriter in SOLR 4.0

2013-03-17 Thread mark12345

This looks similar to the issue I also have:

*
http://lucene.472066.n3.nabble.com/Solr-4-1-4-2-SolrException-Error-opening-new-searcher-td4046543.html

  
*https://issues.apache.org/jira/browse/SOLR-4605




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-a-closed-IndexWriter-in-SOLR-4-0-tp4047392p4048421.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Handling a closed IndexWriter in SOLR 4.0

2013-03-14 Thread Otis Gospodnetic

Hi Scott,

Not sure why IW would be closed, but:
* consider not (hard) committing after each doc, but just periodically,
every N minutes
* soft committing instead
* using 4.2

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Mar 14, 2013 at 11:55 AM, Danzig, Scott wrote:

> Hey all,
>
> We're using a Solr 4 core to handle our article data.  When someone in our
> CMS publishes an article, we have a listener that indexes it straight to
> solr.  We use the previously instantiated HttpSolrServer, build the solr
> document, add it with server.add(doc) .. then do a server.commit() right
> away.  For some reason, sometimes this exception is thrown, which I suspect
> is related to a simultaneous data import done from another client which
> sometimes errors:
>
> Feb 26, 2013 5:07:51 PM org.apache.solr.common.SolrException log
> SEVERE: null:org.apache.solr.common.SolrException: Error opening new
> searcher
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1310)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1422)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:560)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
> at
> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
> at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:550)
> at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:563)
> at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4196)
> at
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:266)
> at
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245)
> at
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
> at
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1256)
> ... 28 more
>
> I'm not sure if the error is causing the IndexWriter to close, and why an
> IndexWriter would be shared across clients, but usually, I can get around
> this by basically creating a new HttpSolrServer and trying again.  But it
> doesn't always work, perhaps due to frequency… I don't like the idea of an
> "infinite loop of creating connections until it works".  I'd rather
> understand what's going on.  What's the proper way to fix this?  I see I
> can add a doc with a commitWithMs of "0" and maybe this couples the add
> tightly with the commit and would prevent interference.  But am I totally
> off the mark here as to the problem?  Suggestions?
>
> Posted this on java-u

Handling a closed IndexWriter in SOLR 4.0

2013-03-14 Thread Danzig, Scott

Hey all,

We're using a Solr 4 core to handle our article data.  When someone in our CMS 
publishes an article, we have a listener that indexes it straight to solr.  We 
use the previously instantiated HttpSolrServer, build the solr document, add it 
with server.add(doc) .. then do a server.commit() right away.  For some reason, 
sometimes this exception is thrown, which I suspect is related to a 
simultaneous data import done from another client which sometimes errors:

Feb 26, 2013 5:07:51 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1310)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1422)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:560)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1007)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is 
closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:550)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:563)
at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:4196)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:266)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:245)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
at 
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1256)
... 28 more

I'm not sure if the error is causing the IndexWriter to close, and why an 
IndexWriter would be shared across clients, but usually, I can get around this 
by basically creating a new HttpSolrServer and trying again.  But it doesn't 
always work, perhaps due to frequency… I don't like the idea of an "infinite 
loop of creating connections until it works".  I'd rather understand what's 
going on.  What's the proper way to fix this?  I see I can add a doc with a 
commitWithMs of "0" and maybe this couples the add tightly with the commit and 
would prevent interference.  But am I totally off the mark here as to the 
problem?  Suggestions?

Posted this on java-user before, but then realized solr-user existed, so please 
forgive the redundancy…

Thanks for reading!

- Scott

Re: Solr 4.0 to Solr 4.1 upgrade

2013-03-12 Thread richardg

This ended up being a SPM issue.  I noticed the same issue w/ 4.2 and decided
to upgrade to monitor version 1.9.0 and it is now showing correct data.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-to-Solr-4-1-upgrade-tp4044990p4046631.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 to Solr 4.1 upgrade

2013-03-06 Thread richardg

Otis,

I noticed this in my logs repeatedly during that time period:

Mar 5, 2013 1:28:00 PM org.apache.solr.core.CachingDirectoryFactory close
INFO: Releasing
directory:/usr/local/solr_aggregate/solr_aggregate/data/index

It wasn't in my logs any other time.

I found this:

https://issues.apache.org/jira/browse/SOLR-4200

Not sure what would cause this or if that is causing my issues, other
metrics during that time seem normal.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-to-Solr-4-1-upgrade-tp4044990p4045222.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 to Solr 4.1 upgrade

2013-03-05 Thread Otis Gospodnetic

Hello Richard,

Did you see anything in the logs?
What did other metrics look like?  I'l look at system metrics like disk IO
and network IO, CPU, and also JVM/GC metrics first.  Any sudden changes in
those metrics could point you in the right direction.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/

On Tue, Mar 5, 2013 at 3:00 PM, richardg  wrote:

> I upgrade one of my slaves by replacing the solr.war, all other slaves and
> master were still 4.0.  When I started to monitor it w/ SPM I noticed that
> the request rate was way up while the request count was way down.  I've
> since put back the solr.war for 4.0 and the slave has returned to normal.
>
> I'm concerned if I upgraded my whole setup to 4.1 performance will suffer.
>
> 
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-0-to-Solr-4-1-upgrade-tp4044990.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Solr 4.0 to Solr 4.1 upgrade

2013-03-05 Thread richardg

I upgrade one of my slaves by replacing the solr.war, all other slaves and
master were still 4.0.  When I started to monitor it w/ SPM I noticed that
the request rate was way up while the request count was way down.  I've
since put back the solr.war for 4.0 and the slave has returned to normal.  

I'm concerned if I upgraded my whole setup to 4.1 performance will suffer.

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-to-Solr-4-1-upgrade-tp4044990.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Returning to Solr 4.0 from 4.1

2013-03-03 Thread Dotan Cohen

On Sat, Mar 2, 2013 at 9:32 PM, Upayavira  wrote:
> What I'm questioning is whether the issue you see in 4.1 has been
> resolved in Subversion. While I would not expect 4.0 to read a 4.1
> index, the SVN branch/4.2 should be able to do so effortlessly.
>
> Upayavira
>

I see, thanks. Actually, running a clean 4.1 with no previous index
does not have the issues.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-02 Thread Upayavira

What I'm questioning is whether the issue you see in 4.1 has been
resolved in Subversion. While I would not expect 4.0 to read a 4.1
index, the SVN branch/4.2 should be able to do so effortlessly.

Upayavira

On Sat, Mar 2, 2013, at 06:17 PM, Dotan Cohen wrote:
> On Fri, Mar 1, 2013 at 1:37 PM, Upayavira  wrote:
> > Can you use a checkout from SVN? Does that resolve your issues? That is
> > what will become 4.2 when it is released soon:
> >
> > https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/
> >
> > Upayavira
> >
> 
> Thank you. Which feature of 4.2 are you suggesting for this issue? Can
> Solr 4.2 natively import from a Solr index?
> 
> 
> -- 
> Dotan Cohen
> 
> http://gibberish.co.il
> http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-02 Thread Dotan Cohen

On Fri, Mar 1, 2013 at 1:37 PM, Upayavira  wrote:
> Can you use a checkout from SVN? Does that resolve your issues? That is
> what will become 4.2 when it is released soon:
>
> https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/
>
> Upayavira
>

Thank you. Which feature of 4.2 are you suggesting for this issue? Can
Solr 4.2 natively import from a Solr index?


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Lance Norskog


Yes, the SolrEntityProcessor can be used for this.
If you stored the original document bodies in the Solr index!
You can also download the documents in Json or CSV format and re-upload 
those to old Solr. I don't know if CSV will work for your docs.  If CSV 
works, you can directly upload what you download. If you download JSON, 
you have to "unwrap" the outermost structure and upload the data as an 
array.


There are problems with the SolrEntityProcessor.1)  It is 
single-threaded. 2) If you 'copyField' to a field, and store that field, 
you have to be sure not to reload the contents of the field, because you 
will add a new copy from the 'source' field.


On 03/01/2013 04:48 AM, Alexandre Rafalovitch wrote:

What about SolrEntityProcessor in DIH?
https://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

Regards,
 Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Mar 1, 2013 at 5:16 AM, Dotan Cohen  wrote:


On Fri, Mar 1, 2013 at 11:59 AM, Rafał Kuć  wrote:

Hello!

I assumed that re-indexing can be painful in your case, if it wouldn't
you probably would re-index by now :) I guess (didn't test it myself),
that you can create another collection inside your cluster, use the
old codec for Lucene 4.0 (setting the version in solrconfig.xml should
be enough) and re-indexing, but still re-indexing will have to be
done. Or maybe someone knows a better way ?


Will I have to reindex via an external script bridging, such as a
Python script which requests N documents at a time, indexes them into
Solr 4.1, then requests another N documents to index? Or is there
internal Solr / Lucene facility for this? I've actually looked for
such a facility, but as I am unable to find such a thing I ask.


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Alexandre Rafalovitch

What about SolrEntityProcessor in DIH?
https://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

Regards,
Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Mar 1, 2013 at 5:16 AM, Dotan Cohen  wrote:

> On Fri, Mar 1, 2013 at 11:59 AM, Rafał Kuć  wrote:
> > Hello!
> >
> > I assumed that re-indexing can be painful in your case, if it wouldn't
> > you probably would re-index by now :) I guess (didn't test it myself),
> > that you can create another collection inside your cluster, use the
> > old codec for Lucene 4.0 (setting the version in solrconfig.xml should
> > be enough) and re-indexing, but still re-indexing will have to be
> > done. Or maybe someone knows a better way ?
> >
>
> Will I have to reindex via an external script bridging, such as a
> Python script which requests N documents at a time, indexes them into
> Solr 4.1, then requests another N documents to index? Or is there
> internal Solr / Lucene facility for this? I've actually looked for
> such a facility, but as I am unable to find such a thing I ask.
>
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Upayavira

Can you use a checkout from SVN? Does that resolve your issues? That is
what will become 4.2 when it is released soon:

https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/

Upayavira

On Fri, Mar 1, 2013, at 10:51 AM, Dotan Cohen wrote:
> On Fri, Mar 1, 2013 at 12:22 PM, Rafał Kuć  wrote:
> > Hello!
> >
> > As far as I know you have to re-index using external tool.
> >
> 
> Thank you Rafał. That is what I figured.
> 
> 
> 
> -- 
> Dotan Cohen
> 
> http://gibberish.co.il
> http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Dotan Cohen

On Fri, Mar 1, 2013 at 12:22 PM, Rafał Kuć  wrote:
> Hello!
>
> As far as I know you have to re-index using external tool.
>

Thank you Rafał. That is what I figured.



-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Rafał Kuć

Hello!

As far as I know you have to re-index using external tool.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> On Fri, Mar 1, 2013 at 11:59 AM, Rafał Kuć  wrote:
>> Hello!
>>
>> I assumed that re-indexing can be painful in your case, if it wouldn't
>> you probably would re-index by now :) I guess (didn't test it myself),
>> that you can create another collection inside your cluster, use the
>> old codec for Lucene 4.0 (setting the version in solrconfig.xml should
>> be enough) and re-indexing, but still re-indexing will have to be
>> done. Or maybe someone knows a better way ?
>>

> Will I have to reindex via an external script bridging, such as a
> Python script which requests N documents at a time, indexes them into
> Solr 4.1, then requests another N documents to index? Or is there
> internal Solr / Lucene facility for this? I've actually looked for
> such a facility, but as I am unable to find such a thing I ask.

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Dotan Cohen

On Fri, Mar 1, 2013 at 11:59 AM, Rafał Kuć  wrote:
> Hello!
>
> I assumed that re-indexing can be painful in your case, if it wouldn't
> you probably would re-index by now :) I guess (didn't test it myself),
> that you can create another collection inside your cluster, use the
> old codec for Lucene 4.0 (setting the version in solrconfig.xml should
> be enough) and re-indexing, but still re-indexing will have to be
> done. Or maybe someone knows a better way ?
>

Will I have to reindex via an external script bridging, such as a
Python script which requests N documents at a time, indexes them into
Solr 4.1, then requests another N documents to index? Or is there
internal Solr / Lucene facility for this? I've actually looked for
such a facility, but as I am unable to find such a thing I ask.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Rafał Kuć

Hello!

I assumed that re-indexing can be painful in your case, if it wouldn't
you probably would re-index by now :) I guess (didn't test it myself),
that you can create another collection inside your cluster, use the
old codec for Lucene 4.0 (setting the version in solrconfig.xml should
be enough) and re-indexing, but still re-indexing will have to be
done. Or maybe someone knows a better way ?

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> On Fri, Mar 1, 2013 at 11:28 AM, Rafał Kuć  wrote:
>> Hello!
>>
>> I suppose the only way to make this work will be reindexing the data.
>> Solr 4.1 uses Lucene 4.1 as you know, which introduced new default
>> codec with stored fields compression and this is one of the reasons
>> you can't read that index with 4.0.
>>

> Thank you. My first inclination is to "reindex" the documents, but the
> only store of these documents is the Solr index itself. I am trying to
> find solutions to create a new core and to index the data in the old
> core into the new core. I'm not finding any good ways of going about
> this.

> Note that we are talking about ~18,000,000 (yes, 18 million) small
> documents similar to 'tweets' (mostly under 1 KiB each, very very few
> over 5 KiB).

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Dotan Cohen

On Fri, Mar 1, 2013 at 11:28 AM, Rafał Kuć  wrote:
> Hello!
>
> I suppose the only way to make this work will be reindexing the data.
> Solr 4.1 uses Lucene 4.1 as you know, which introduced new default
> codec with stored fields compression and this is one of the reasons
> you can't read that index with 4.0.
>

Thank you. My first inclination is to "reindex" the documents, but the
only store of these documents is the Solr index itself. I am trying to
find solutions to create a new core and to index the data in the old
core into the new core. I'm not finding any good ways of going about
this.

Note that we are talking about ~18,000,000 (yes, 18 million) small
documents similar to 'tweets' (mostly under 1 KiB each, very very few
over 5 KiB).

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Returning to Solr 4.0 from 4.1

2013-03-01 Thread Rafał Kuć

Hello!

I suppose the only way to make this work will be reindexing the data.
Solr 4.1 uses Lucene 4.1 as you know, which introduced new default
codec with stored fields compression and this is one of the reasons
you can't read that index with 4.0.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Solr 4.1 has been giving up much trouble rejecting documents indexed.
> While I try to work my way through this, I would like to move our
> application back to Solr 4.0. However, now when I try to start Solr
> with same index that was created with Solr 4.0 but has been running on
> 4.1 few a few days I get this error chain:

> org.apache.solr.common.SolrException: Error opening new searcher
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> Caused by: java.lang.IllegalArgumentException: A SPI class of type
> org.apache.lucene.codecs.Codec with name 'Lucene41' does not exist.
> You need to add the corresponding JAR file supporting this SPI to your
> classpath.The current classpath supports the following names:
> [Lucene40, Lucene3x]

> Obviously I'll not be installing Lucene41 in Solr 4.0, but is there
> any way to work around this? Note that neither solrconf.xml nor
> schema.xml have changed. Thanks.

Returning to Solr 4.0 from 4.1

2013-03-01 Thread Dotan Cohen

Solr 4.1 has been giving up much trouble rejecting documents indexed.
While I try to work my way through this, I would like to move our
application back to Solr 4.0. However, now when I try to start Solr
with same index that was created with Solr 4.0 but has been running on
4.1 few a few days I get this error chain:

org.apache.solr.common.SolrException: Error opening new searcher
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
Caused by: java.lang.IllegalArgumentException: A SPI class of type
org.apache.lucene.codecs.Codec with name 'Lucene41' does not exist.
You need to add the corresponding JAR file supporting this SPI to your
classpath.The current classpath supports the following names:
[Lucene40, Lucene3x]

Obviously I'll not be installing Lucene41 in Solr 4.0, but is there
any way to work around this? Note that neither solrconf.xml nor
schema.xml have changed. Thanks.


-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Solr 4.0 committing on an index built by another instance

2013-02-24 Thread Prakhar Birla

Hi Mark,

Thanks for the reply. The issue happened because the server kill/restart
process failed which meant the old instance was still running, and it had
the data related to old files loaded into memory. And when commit was
issued, it was trying to locate those old files and failing. Once this was
fixed, it's running like a charm!

On 23 February 2013 20:18, Mark Miller  wrote:

> How are you doing the backup? You have to coordinate with Solr - files may
> be changing when you try and copy it, leaving to an inconsistent index. If
> you want to do a live backup, you have to use the backup feature of the
> replication handler.
>
> - Mark
>
> On Feb 23, 2013, at 3:54 AM, Prakhar Birla  wrote:
>
> > Hi,
> >
> > We use Solr 4.0 for our main searcher so it is a very vital part. We have
> > set up a process called Index reassurance which assures that all
> documents
> > are available in Solr by comparing to our database. In short this is
> > achieved as: The production server (read/write) is a slave while another
> > server (write only) is the master where the indexes are built and
> > replicated to the slave.
> >
> > We are adding a backup/restore feature to this process which means that
> the
> > backup can originate from either server (while it is running) and will be
> > applied to the master after which the indexes are built and replicated.
> >
> > The backup is a tar.gz copy of the core.
> >
> > The problem we are facing is when a commit is done on data loaded from
> the
> > backup. Following is the stack trace:
> >
> > SEVERE: null:java.io.FileNotFoundException:
> /var/www/locationapp7078/solr/
> >> collection1/data/index.20130222154157971/_18.fnm (No such file or
> >> directory)
> >>at java.io.RandomAccessFile.open(Native Method)
> >>at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> >>at org.apache.lucene.store.MMapDirectory.openInput(
> >> MMapDirectory.java:222)
> >>at org.apache.lucene.store.NRTCachingDirectory.openInput(
> >> NRTCachingDirectory.java:232)
> >>at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(
> >> Lucene40FieldInfosReader.java:52)
> >>at org.apache.lucene.index.SegmentCoreReaders.(
> >> SegmentCoreReaders.java:101)
> >>at
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:57)
> >>at org.apache.lucene.index.ReadersAndLiveDocs.getReader(
> >> ReadersAndLiveDocs.java:120)
> >>at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(
> >> BufferedDeletesStream.java:267)
> >>at org.apache.lucene.index.IndexWriter.applyAllDeletes(
> >> IndexWriter.java:3010)
> >>at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(
> >> IndexWriter.java:3001)
> >>at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2974)
> >>at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2943)
> >>at org.apache.lucene.index.IndexWriter.forceMerge(
> >> IndexWriter.java:1606)
> >>at org.apache.lucene.index.IndexWriter.forceMerge(
> >> IndexWriter.java:1582)
> >>at org.apache.solr.update.DirectUpdateHandler2.commit(
> >> DirectUpdateHandler2.java:515)
> >>at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(
> >> RunUpdateProcessorFactory.java:87)
> >>at org.apache.solr.update.processor.UpdateRequestProcessor.
> >> processCommit(UpdateRequestProcessor.java:64)
> >>at org.apache.solr.update.processor.DistributedUpdateProcessor.
> >> processCommit(DistributedUpdateProcessor.java:1007)
> >>at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(
> >> LogUpdateProcessorFactory.java:157)
> >>at org.apache.solr.handler.loader.XMLLoader.
> >> processUpdate(XMLLoader.java:250)
> >>at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
> >>at org.apache.solr.handler.UpdateRequestHandler$1.load(
> >> UpdateRequestHandler.java:92)
> >>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
> >> ContentStreamHandlerBase.java:74)
> >>at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> >> RequestHandlerBase.java:129)
> >>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
> >>at org.apache.solr.servlet.SolrDispatchFilter.execute(
> >> SolrDispatchFilter.java:455)
> >>at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >>

Re: Solr 4.0 committing on an index built by another instance

2013-02-23 Thread Mark Miller

How are you doing the backup? You have to coordinate with Solr - files may be 
changing when you try and copy it, leaving to an inconsistent index. If you 
want to do a live backup, you have to use the backup feature of the replication 
handler.

- Mark

On Feb 23, 2013, at 3:54 AM, Prakhar Birla  wrote:

> Hi,
> 
> We use Solr 4.0 for our main searcher so it is a very vital part. We have
> set up a process called Index reassurance which assures that all documents
> are available in Solr by comparing to our database. In short this is
> achieved as: The production server (read/write) is a slave while another
> server (write only) is the master where the indexes are built and
> replicated to the slave.
> 
> We are adding a backup/restore feature to this process which means that the
> backup can originate from either server (while it is running) and will be
> applied to the master after which the indexes are built and replicated.
> 
> The backup is a tar.gz copy of the core.
> 
> The problem we are facing is when a commit is done on data loaded from the
> backup. Following is the stack trace:
> 
> SEVERE: null:java.io.FileNotFoundException: /var/www/locationapp7078/solr/
>> collection1/data/index.20130222154157971/_18.fnm (No such file or
>> directory)
>>at java.io.RandomAccessFile.open(Native Method)
>>at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>>at org.apache.lucene.store.MMapDirectory.openInput(
>> MMapDirectory.java:222)
>>at org.apache.lucene.store.NRTCachingDirectory.openInput(
>> NRTCachingDirectory.java:232)
>>at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(
>> Lucene40FieldInfosReader.java:52)
>>at org.apache.lucene.index.SegmentCoreReaders.(
>> SegmentCoreReaders.java:101)
>>at org.apache.lucene.index.SegmentReader.(SegmentReader.java:57)
>>at org.apache.lucene.index.ReadersAndLiveDocs.getReader(
>> ReadersAndLiveDocs.java:120)
>>at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(
>> BufferedDeletesStream.java:267)
>>at org.apache.lucene.index.IndexWriter.applyAllDeletes(
>> IndexWriter.java:3010)
>>at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(
>> IndexWriter.java:3001)
>>at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2974)
>>at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2943)
>>at org.apache.lucene.index.IndexWriter.forceMerge(
>> IndexWriter.java:1606)
>>at org.apache.lucene.index.IndexWriter.forceMerge(
>> IndexWriter.java:1582)
>>at org.apache.solr.update.DirectUpdateHandler2.commit(
>> DirectUpdateHandler2.java:515)
>>at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(
>> RunUpdateProcessorFactory.java:87)
>>at org.apache.solr.update.processor.UpdateRequestProcessor.
>> processCommit(UpdateRequestProcessor.java:64)
>>at org.apache.solr.update.processor.DistributedUpdateProcessor.
>> processCommit(DistributedUpdateProcessor.java:1007)
>>at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(
>> LogUpdateProcessorFactory.java:157)
>>at org.apache.solr.handler.loader.XMLLoader.
>> processUpdate(XMLLoader.java:250)
>>at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
>>at org.apache.solr.handler.UpdateRequestHandler$1.load(
>> UpdateRequestHandler.java:92)
>>at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
>> ContentStreamHandlerBase.java:74)
>>at org.apache.solr.handler.RequestHandlerBase.handleRequest(
>> RequestHandlerBase.java:129)
>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
>>at org.apache.solr.servlet.SolrDispatchFilter.execute(
>> SolrDispatchFilter.java:455)
>>at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> SolrDispatchFilter.java:276)
>>at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> doFilter(ServletHandler.java:1337)
>>at org.eclipse.jetty.servlet.ServletHandler.doHandle(
>> ServletHandler.java:484)
>>at org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> ScopedHandler.java:119)
>>at org.eclipse.jetty.security.SecurityHandler.handle(
>> SecurityHandler.java:524)
>>at org.eclipse.jetty.server.session.SessionHandler.
>> doHandle(SessionHandler.java:233)
>>at org.eclipse.jetty.server.handler.ContextHandler.
>> doHandle(ContextHandler.java:1065)
>>at org.eclipse.jetty.servlet.ServletHandler.doScope(
>> ServletHandler.java:413)
>>at org.eclipse.jetty.server.session.Sess

Re: Do I have to reindex when upgrading from solr 4.0 to 4.1?

2013-02-12 Thread Joel Bernstein

Michael is correct, that was what was said at the bootcamp (by me). I
believe this may not be correct though.

Further code review shows that Solr 4.0 was already distributing documents
using the hash range technique used in 4.1. The big change in 4.1 was that
a composite hash key could be used to distribute docs around the hash
range. But docs that don't use the composite key would be distributed
similarly to 4.0.

So, you may not need to re-index to take advantage of shard splitting. This
will become more clear as shard splitting documentation becomes available.

On Mon, Feb 11, 2013 at 12:45 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Arkadi,
>
> That's the answer I received at Solr Bootcamp, yes.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Mon, Feb 11, 2013 at 2:23 AM, Arkadi Colson  wrote:
> > Does it mean that when you redo indexing after the upgrade to 4.1 shard
> > splitting will work in 4.2?
> >
> > Met vriendelijke groeten
> >
> > Arkadi Colson
> >
> > Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
> > T +32 11 64 08 80 • F +32 11 64 08 81
> >
> > On 02/10/2013 05:21 PM, Michael Della Bitta wrote:
> >
> > No. You can just update Solr in place. But...
> >
> > If you're using Solr Cloud, your documents won't be hashed in a way
> > that lets you do shard splitting in 4.2. That seemed to be the
> > consensus during Solr Boot Camp.
> >
> > Michael Della Bitta
> >
> > 
> > Appinions
> > 18 East 41st Street, 2nd Floor
> > New York, NY 10017-6271
> >
> > www.appinions.com
> >
> > Where Influence Isn’t a Game
> >
> >
> > On Sun, Feb 10, 2013 at 10:46 AM, adfel70  wrote:
> >
> > Do I have to recreate the collections/cores?
> > Do I have to reindex?
> >
> > thanks.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>

-- 
Joel Bernstein
Professional Services LucidWorks

Solr 4.0 is stripping XML format from RSS content field

2013-02-11 Thread eShard

Hi,
I'm running solr 4.0 final with manifoldcf 1.1 and I verified via fiddler
that Manifold is indeed sending the content field from a RSS feed that
contains xml data
However, when I query the index the content field is there with just the
data; the XML structure is gone.
Does anyone know how to stop Solr from doing this?
I'm using tika but I don't see it in the update/extract handler.
Can anyone point me in the right direction?

Thanks,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Do I have to reindex when upgrading from solr 4.0 to 4.1?

2013-02-11 Thread Michael Della Bitta

Arkadi,

That's the answer I received at Solr Bootcamp, yes.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Feb 11, 2013 at 2:23 AM, Arkadi Colson  wrote:
> Does it mean that when you redo indexing after the upgrade to 4.1 shard
> splitting will work in 4.2?
>
> Met vriendelijke groeten
>
> Arkadi Colson
>
> Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
> T +32 11 64 08 80 • F +32 11 64 08 81
>
> On 02/10/2013 05:21 PM, Michael Della Bitta wrote:
>
> No. You can just update Solr in place. But...
>
> If you're using Solr Cloud, your documents won't be hashed in a way
> that lets you do shard splitting in 4.2. That seemed to be the
> consensus during Solr Boot Camp.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Sun, Feb 10, 2013 at 10:46 AM, adfel70  wrote:
>
> Do I have to recreate the collections/cores?
> Do I have to reindex?
>
> thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Do I have to reindex when upgrading from solr 4.0 to 4.1?

2013-02-10 Thread Arkadi Colson

Does it mean that when you redo indexing after the upgrade to 4.1 shard 
splitting will work in 4.2?


Met vriendelijke groeten

Arkadi Colson

Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
T +32 11 64 08 80 • F +32 11 64 08 81

On 02/10/2013 05:21 PM, Michael Della Bitta wrote:

No. You can just update Solr in place. But...

If you're using Solr Cloud, your documents won't be hashed in a way
that lets you do shard splitting in 4.2. That seemed to be the
consensus during Solr Boot Camp.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Sun, Feb 10, 2013 at 10:46 AM, adfel70  wrote:

Do I have to recreate the collections/cores?
Do I have to reindex?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Do I have to reindex when upgrading from solr 4.0 to 4.1?

2013-02-10 Thread Michael Della Bitta

No. You can just update Solr in place. But...

If you're using Solr Cloud, your documents won't be hashed in a way
that lets you do shard splitting in 4.2. That seemed to be the
consensus during Solr Boot Camp.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Sun, Feb 10, 2013 at 10:46 AM, adfel70  wrote:
> Do I have to recreate the collections/cores?
> Do I have to reindex?
>
> thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Do I have to reindex when upgrading from solr 4.0 to 4.1?

2013-02-10 Thread adfel70

Do I have to recreate the collections/cores?
Do I have to reindex?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-29 Thread Smiley, David W.

The wiki is open to everyone.  If you do edit it, please try to keep it
organized.  

On 1/24/13 9:41 AM, "Viacheslav Davidovich"
 wrote:

>Hi David,
>
>thank you for your answer.
>
>After update to this field type and change the SOLR query I receive
>required behavior.
>
>Also could you update the WIKI page after the words "it needs to be in
>WEB-INF/lib in Solr's war file, basically" also add the maven artifact
>code like this?
>
>
>com.vividsolutions
>jts
>1.13
> 
>
>I think this may help for users used maven.
>
>WBR Viacheslav.
>
>On 23.01.2013, at 19:24, Smiley, David W. wrote:
>
>> Viacheslav,
>> 
>> 
>> SOLR-2155 is only compatible with Solr 3.  However the technology it is
>> based on lives on in Lucene/Solr 4 in the
>> "SpatialRecursivePrefixTreeFieldType" field type.  In the example schema
>> it's registered under the name "location_rpt".  For more information on
>> how to use this field type, see: SpatialRecursivePrefixTreeFieldType
>> 
>> ~ David Smiley
>> 
>> On 1/23/13 11:11 AM, "Viacheslav Davidovich"
>>  wrote:
>> 
>>> Hi, 
>>> 
>>> With Solr 3.5 I use SOLR-2155 plugin to filter the documents by
>>>distance
>>> as described in
>>> http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search and
>>> this solution perfectly filter the multiValued data defined in
>>>schema.xml
>>> like
>>> 
>>> >> length="12" />
>>> 
>>> >> multiValued="true"/>
>>> 
>>> the query looks like this with Solr 3.5:  q=*:*&fq={!geofilt}&sfield=
>>> location_data&pt=45.15,-93.85&d=50&sort=geodist() asc
>>> 
>>> As SOLR-2155 plugin not compatible with solr 4.0 I try to change the
>>> field definition to next:
>>> 
>>> >> subFieldSuffix="_coordinate" />
>>> 
>>> >>stored="true"
>>> multiValued="true"/>
>>> 
>>> >> stored="false" />
>>> 
>>> But in this case after geofilt by location_data execution the correct
>>> values returns only if the field have 1 value, if more them 1 value
>>> stored in index required documents returns only when all the location
>>> points are matched.
>>> 
>>> Have anybody experience or any ideas how to receive the same behavior
>>>in
>>> solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?
>>> 
>>> Is this possible at all or I need to refactor the document structure
>>>and
>>> field definition to store only 1 location value per document?
>>> 
>>> WBR Viacheslav.
>>> 
>> 
>> 
>

Re: [SOLR 4.0] Number of fields vs searching speed

2013-01-28 Thread Mikhail Khludnev

Roman,

My bet is that number of indexed fields doesn't impacts the search time,
and number of queried fields does linearly increase the search time.


On Mon, Jan 28, 2013 at 11:22 AM, Roman Slavik  wrote:

> Hi guys,
> what is relation between number of indexed fields and searching speed?
>
> For example I have same number of records, same searching SOLR query but
> 100
> indexed fields for each record in case 1 and 1000 fields in case 2. I's
> obvious that searching time in case 2 will be greater, but how much? 10
> times? Or is there another relation between number of indexed fields and
> search time?
>
> Thanks a lot!
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-4-0-Number-of-fields-vs-searching-speed-tp4036665.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

[SOLR 4.0] Number of fields vs searching speed

2013-01-27 Thread Roman Slavik

Hi guys,
what is relation between number of indexed fields and searching speed? 

For example I have same number of records, same searching SOLR query but 100
indexed fields for each record in case 1 and 1000 fields in case 2. I's
obvious that searching time in case 2 will be greater, but how much? 10
times? Or is there another relation between number of indexed fields and
search time?

Thanks a lot!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-0-Number-of-fields-vs-searching-speed-tp4036665.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-24 Thread Viacheslav Davidovich

Hi David,

thank you for your answer.

After update to this field type and change the SOLR query I receive required 
behavior.

Also could you update the WIKI page after the words "it needs to be in 
WEB-INF/lib in Solr's war file, basically" also add the maven artifact code 
like this?


com.vividsolutions
jts
1.13
 

I think this may help for users used maven.

WBR Viacheslav.

On 23.01.2013, at 19:24, Smiley, David W. wrote:

> Viacheslav,
> 
> 
> SOLR-2155 is only compatible with Solr 3.  However the technology it is
> based on lives on in Lucene/Solr 4 in the
> "SpatialRecursivePrefixTreeFieldType" field type.  In the example schema
> it's registered under the name "location_rpt".  For more information on
> how to use this field type, see: SpatialRecursivePrefixTreeFieldType
> 
> ~ David Smiley
> 
> On 1/23/13 11:11 AM, "Viacheslav Davidovich"
>  wrote:
> 
>> Hi, 
>> 
>> With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance
>> as described in 
>> http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search and
>> this solution perfectly filter the multiValued data defined in schema.xml
>> like
>> 
>> > length="12" />
>> 
>> > multiValued="true"/>
>> 
>> the query looks like this with Solr 3.5:  q=*:*&fq={!geofilt}&sfield=
>> location_data&pt=45.15,-93.85&d=50&sort=geodist() asc
>> 
>> As SOLR-2155 plugin not compatible with solr 4.0 I try to change the
>> field definition to next:
>> 
>> > subFieldSuffix="_coordinate" />
>> 
>> > multiValued="true"/>
>> 
>> > stored="false" />
>> 
>> But in this case after geofilt by location_data execution the correct
>> values returns only if the field have 1 value, if more them 1 value
>> stored in index required documents returns only when all the location
>> points are matched.
>> 
>> Have anybody experience or any ideas how to receive the same behavior in
>> solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?
>> 
>> Is this possible at all or I need to refactor the document structure and
>> field definition to store only 1 location value per document?
>> 
>> WBR Viacheslav.
>> 
> 
>

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone

I look at the console logs for my server (3.5 and 4.0) and I
>run the indexer against each, I see a subtle difference in this print out
>when it connects to the solr core.
>The 3.5 version prints this out:
>webapp=/solr path=/update
>params={waitSearcher=true&wt=javabin&commit=true&softCommit=false&version=
>2
>} {commit=} 0 2722
>
>
>The 4.0 version prints this out
> webapp=/solr path=/update/javabin
>params={wt=javabin&commit=true&waitFlush=true&waitSearcher=true&version=2}
>status=0 QTime=1404
>
>
>
>The params for the update handle seem ever so slightly different. The 3.5
>version (the one that runs fast) has a setting softCommit=false.
>The 4.0 version does not print that setting, but instead prints this
>setting waitFlush=true.
>
>These could be irrelevant, but thought I should add the information.
>
>-Kevin
>
>On 1/23/13 11:42 AM, "Kevin Stone"  wrote:
>
>>Do you mean commenting out the ... tag? Because
>>that I already commented out. Or do I also need to remove the entire
>> tag? Sorry, I am not too familiar with everything in the
>>solrconfig file. I have a tag that essentially looks like this:
>>
>>
>>
>>
>>Everything inside is commented out.
>>
>>-Kevin
>>
>>On 1/23/13 11:21 AM, "Mark Miller"  wrote:
>>
>>>It's hard to guess, but I might start by looking at what the new
>>>UpdateLog is costing you. Take it's definition out of solrconfig.xml and
>>>try your test again. Then let's take it from there.
>>>
>>>- Mark
>>>
>>>On Jan 23, 2013, at 11:00 AM, Kevin Stone  wrote:
>>>
>>>> I am having some difficulty migrating our solr indexing scripts from
>>>>using 3.5 to solr 4.0. Notably, I am trying to track down why our
>>>>performance in solr 4.0 is about 5-10 times slower when indexing
>>>>documents. Querying is still quite fast.
>>>>
>>>> The code adds  documents in groups of 1000, and adds each group to the
>>>>solr in a thread. The documents are somewhat large, including maybe
>>>>30-40 different field types, mostly multivalued. Here are some snippets
>>>>of the code we used in 3.5.
>>>>
>>>>
>>>> MultiThreadedHttpConnectionManager mgr = new
>>>>MultiThreadedHttpConnectionManager();
>>>>
>>>> HttpClient client = new HttpClient(mgr);
>>>>
>>>> CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url
>>>>for
>>>>our index",client );
>>>>
>>>> server.setRequestWriter(new BinaryRequestWriter());
>>>>
>>>>
>>>> Then, we delete the index, and proceed to generate documents and load
>>>>the groups in a thread that looks kind of like this. I've omitted some
>>>>overhead for handling exceptions, and retry attempts.
>>>>
>>>>
>>>> class DocWriterThread implements Runnable
>>>>
>>>> {
>>>>
>>>>CommonsHttpSolrServer server;
>>>>
>>>>Collection docs;
>>>>
>>>>private int commitWithin = 5; // 50 seconds
>>>>
>>>>public DocWriterThread(CommonsHttpSolrServer
>>>>server,Collection docs)
>>>>
>>>>{
>>>>
>>>>this.server=server;
>>>>
>>>>this.docs=docs;
>>>>
>>>>}
>>>>
>>>> public void run()
>>>>
>>>> {
>>>>
>>>>    // set the commitWithin feature
>>>>
>>>>server.add(docs,commitWithin);
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>>
>>>> Now, I've had to change some things to get this compile with the Solr
>>>>4.0 libraries. Here is what I tried to convert the above code to. I
>>>>don't know if these are the correct equivalents, as I am not familiar
>>>>with apache httpcomponents.
>>>>
>>>>
>>>>
>>>> ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();
>>>>
>>>> DefaultHttpClient client = new DefaultHttpClient(mgr);
>>>>
>>>> HttpSolrServer server = new HttpSolrServer( "some url for our solr
>>>>index",client );
>>>>
>>>> server.setRequestWriter(new BinaryRequestWriter());
>>>>
>>>>
>>>>
>&g

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone

I'm still poking around trying to find the differences. I found a couple
things that may or may not be relevant.
First, when I start up my 3.5 solr, I get all sorts of warnings that my
solrconfig is old and will run using 2.4 emulation.
Of course I had to upgrade the solconfig for the 4.0 instance (which I
already described). I am curious if there could be some feature I was
taking advantage of in 2.4 that doesn't exist now in 4.0. I don't know.

Second when I look at the console logs for my server (3.5 and 4.0) and I
run the indexer against each, I see a subtle difference in this print out
when it connects to the solr core.
The 3.5 version prints this out:
webapp=/solr path=/update
params={waitSearcher=true&wt=javabin&commit=true&softCommit=false&version=2
} {commit=} 0 2722


The 4.0 version prints this out
 webapp=/solr path=/update/javabin
params={wt=javabin&commit=true&waitFlush=true&waitSearcher=true&version=2}
status=0 QTime=1404



The params for the update handle seem ever so slightly different. The 3.5
version (the one that runs fast) has a setting softCommit=false.
The 4.0 version does not print that setting, but instead prints this
setting waitFlush=true.

These could be irrelevant, but thought I should add the information.

-Kevin

On 1/23/13 11:42 AM, "Kevin Stone"  wrote:

>Do you mean commenting out the ... tag? Because
>that I already commented out. Or do I also need to remove the entire
> tag? Sorry, I am not too familiar with everything in the
>solrconfig file. I have a tag that essentially looks like this:
>
>
>
>
>Everything inside is commented out.
>
>-Kevin
>
>On 1/23/13 11:21 AM, "Mark Miller"  wrote:
>
>>It's hard to guess, but I might start by looking at what the new
>>UpdateLog is costing you. Take it's definition out of solrconfig.xml and
>>try your test again. Then let's take it from there.
>>
>>- Mark
>>
>>On Jan 23, 2013, at 11:00 AM, Kevin Stone  wrote:
>>
>>> I am having some difficulty migrating our solr indexing scripts from
>>>using 3.5 to solr 4.0. Notably, I am trying to track down why our
>>>performance in solr 4.0 is about 5-10 times slower when indexing
>>>documents. Querying is still quite fast.
>>>
>>> The code adds  documents in groups of 1000, and adds each group to the
>>>solr in a thread. The documents are somewhat large, including maybe
>>>30-40 different field types, mostly multivalued. Here are some snippets
>>>of the code we used in 3.5.
>>>
>>>
>>> MultiThreadedHttpConnectionManager mgr = new
>>>MultiThreadedHttpConnectionManager();
>>>
>>> HttpClient client = new HttpClient(mgr);
>>>
>>> CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for
>>>our index",client );
>>>
>>> server.setRequestWriter(new BinaryRequestWriter());
>>>
>>>
>>> Then, we delete the index, and proceed to generate documents and load
>>>the groups in a thread that looks kind of like this. I've omitted some
>>>overhead for handling exceptions, and retry attempts.
>>>
>>>
>>> class DocWriterThread implements Runnable
>>>
>>> {
>>>
>>>CommonsHttpSolrServer server;
>>>
>>>Collection docs;
>>>
>>>private int commitWithin = 5; // 50 seconds
>>>
>>>public DocWriterThread(CommonsHttpSolrServer
>>>server,Collection docs)
>>>
>>>{
>>>
>>>this.server=server;
>>>
>>>this.docs=docs;
>>>
>>>}
>>>
>>> public void run()
>>>
>>> {
>>>
>>>// set the commitWithin feature
>>>
>>>server.add(docs,commitWithin);
>>>
>>> }
>>>
>>> }
>>>
>>>
>>> Now, I've had to change some things to get this compile with the Solr
>>>4.0 libraries. Here is what I tried to convert the above code to. I
>>>don't know if these are the correct equivalents, as I am not familiar
>>>with apache httpcomponents.
>>>
>>>
>>>
>>> ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();
>>>
>>> DefaultHttpClient client = new DefaultHttpClient(mgr);
>>>
>>> HttpSolrServer server = new HttpSolrServer( "some url for our solr
>>>index",client );
>>>
>>> server.setRequestWriter(new BinaryRequestWriter());
>>>
>>>
>>>
>>>
>>> The thread method is the

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone

Do you mean commenting out the ... tag? Because
that I already commented out. Or do I also need to remove the entire
 tag? Sorry, I am not too familiar with everything in the
solrconfig file. I have a tag that essentially looks like this:




Everything inside is commented out.

-Kevin

On 1/23/13 11:21 AM, "Mark Miller"  wrote:

>It's hard to guess, but I might start by looking at what the new
>UpdateLog is costing you. Take it's definition out of solrconfig.xml and
>try your test again. Then let's take it from there.
>
>- Mark
>
>On Jan 23, 2013, at 11:00 AM, Kevin Stone  wrote:
>
>> I am having some difficulty migrating our solr indexing scripts from
>>using 3.5 to solr 4.0. Notably, I am trying to track down why our
>>performance in solr 4.0 is about 5-10 times slower when indexing
>>documents. Querying is still quite fast.
>>
>> The code adds  documents in groups of 1000, and adds each group to the
>>solr in a thread. The documents are somewhat large, including maybe
>>30-40 different field types, mostly multivalued. Here are some snippets
>>of the code we used in 3.5.
>>
>>
>> MultiThreadedHttpConnectionManager mgr = new
>>MultiThreadedHttpConnectionManager();
>>
>> HttpClient client = new HttpClient(mgr);
>>
>> CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for
>>our index",client );
>>
>> server.setRequestWriter(new BinaryRequestWriter());
>>
>>
>> Then, we delete the index, and proceed to generate documents and load
>>the groups in a thread that looks kind of like this. I've omitted some
>>overhead for handling exceptions, and retry attempts.
>>
>>
>> class DocWriterThread implements Runnable
>>
>> {
>>
>>CommonsHttpSolrServer server;
>>
>>Collection docs;
>>
>>private int commitWithin = 5; // 50 seconds
>>
>>public DocWriterThread(CommonsHttpSolrServer
>>server,Collection docs)
>>
>>{
>>
>>this.server=server;
>>
>>this.docs=docs;
>>
>>}
>>
>> public void run()
>>
>> {
>>
>>// set the commitWithin feature
>>
>>server.add(docs,commitWithin);
>>
>> }
>>
>> }
>>
>>
>> Now, I've had to change some things to get this compile with the Solr
>>4.0 libraries. Here is what I tried to convert the above code to. I
>>don't know if these are the correct equivalents, as I am not familiar
>>with apache httpcomponents.
>>
>>
>>
>> ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();
>>
>> DefaultHttpClient client = new DefaultHttpClient(mgr);
>>
>> HttpSolrServer server = new HttpSolrServer( "some url for our solr
>>index",client );
>>
>> server.setRequestWriter(new BinaryRequestWriter());
>>
>>
>>
>>
>> The thread method is the same, but uses HttpSolrServer instead of
>>CommonsHttpSolrServer.
>>
>> We also, had an old solrconfig (not sure what version, but it is pre
>>3.x and had mostly default values) that I had to replace with a 4.0
>>style solrconfig.xml. I don't want to post the entire file (as it is
>>large), but I copied one from the solr 4.0 examples, and made a couple
>>changes. First, I wanted to turn off transaction logging. So essentially
>>I have a line like this (everything inside is commented out):
>>
>>
>> 
>>
>>
>> And I added a handler for javabin
>>
>>
>> >class="solr.BinaryUpdateRequestHandler">
>>
>>
>>
>> application/javabin
>>
>>   
>>
>>  
>>
>> I'm not sure what other configurations I should look at. I would think
>>that there should be a big obvious reason why the indexing performance
>>would drop nearly 10 fold.
>>
>> Against our 3.5 instance I timed our index load, and it adds roughly
>>40,000 documents every 3-8 seconds.
>>
>> Against our 4.0 instance it adds 40,000 documents every 70-75 seconds.
>>
>> This isn't the end of the world, and I would love to use the new join
>>feature in solr 4.0. However, we have many different indexes with
>>millions of documents, and this kind of increase in load time is
>>troubling.
>>
>>
>> Thanks for your help.
>>
>>
>> -Kevin
>>
>>
>> The information in this email, including attachments, may be
>>confidential and is intended solely for the addressee(s). If you believe
>>you received this email by mistake, please notify the sender by return
>>email as soon as possible.
>


The information in this email, including attachments, may be confidential and 
is intended solely for the addressee(s). If you believe you received this email 
by mistake, please notify the sender by return email as soon as possible.

Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-23 Thread Smiley, David W.

Viacheslav,


SOLR-2155 is only compatible with Solr 3.  However the technology it is
based on lives on in Lucene/Solr 4 in the
"SpatialRecursivePrefixTreeFieldType" field type.  In the example schema
it's registered under the name "location_rpt".  For more information on
how to use this field type, see: SpatialRecursivePrefixTreeFieldType

~ David Smiley

On 1/23/13 11:11 AM, "Viacheslav Davidovich"
 wrote:

>Hi, 
>
>With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance
>as described in 
>http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search and
>this solution perfectly filter the multiValued data defined in schema.xml
> like
>
>length="12" />
>
>multiValued="true"/>
>
>the query looks like this with Solr 3.5:  q=*:*&fq={!geofilt}&sfield=
>location_data&pt=45.15,-93.85&d=50&sort=geodist() asc
>
>As SOLR-2155 plugin not compatible with solr 4.0 I try to change the
>field definition to next:
>
>subFieldSuffix="_coordinate" />
>
>multiValued="true"/>
>
>stored="false" />
>
>But in this case after geofilt by location_data execution the correct
>values returns only if the field have 1 value, if more them 1 value
>stored in index required documents returns only when all the location
>points are matched.
>
>Have anybody experience or any ideas how to receive the same behavior in
>solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?
>
>Is this possible at all or I need to refactor the document structure and
>field definition to store only 1 location value per document?
>
>WBR Viacheslav.
>

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Mark Miller

It's hard to guess, but I might start by looking at what the new UpdateLog is 
costing you. Take it's definition out of solrconfig.xml and try your test 
again. Then let's take it from there.

- Mark

On Jan 23, 2013, at 11:00 AM, Kevin Stone  wrote:

> I am having some difficulty migrating our solr indexing scripts from using 
> 3.5 to solr 4.0. Notably, I am trying to track down why our performance in 
> solr 4.0 is about 5-10 times slower when indexing documents. Querying is 
> still quite fast.
> 
> The code adds  documents in groups of 1000, and adds each group to the solr 
> in a thread. The documents are somewhat large, including maybe 30-40 
> different field types, mostly multivalued. Here are some snippets of the code 
> we used in 3.5.
> 
> 
> MultiThreadedHttpConnectionManager mgr = new 
> MultiThreadedHttpConnectionManager();
> 
> HttpClient client = new HttpClient(mgr);
> 
> CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for our 
> index",client );
> 
> server.setRequestWriter(new BinaryRequestWriter());
> 
> 
> Then, we delete the index, and proceed to generate documents and load the 
> groups in a thread that looks kind of like this. I've omitted some overhead 
> for handling exceptions, and retry attempts.
> 
> 
> class DocWriterThread implements Runnable
> 
> {
> 
>CommonsHttpSolrServer server;
> 
>Collection docs;
> 
>private int commitWithin = 5; // 50 seconds
> 
>public DocWriterThread(CommonsHttpSolrServer 
> server,Collection docs)
> 
>{
> 
>this.server=server;
> 
>this.docs=docs;
> 
>}
> 
> public void run()
> 
> {
> 
>// set the commitWithin feature
> 
>server.add(docs,commitWithin);
> 
> }
> 
> }
> 
> 
> Now, I've had to change some things to get this compile with the Solr 4.0 
> libraries. Here is what I tried to convert the above code to. I don't know if 
> these are the correct equivalents, as I am not familiar with apache 
> httpcomponents.
> 
> 
> 
> ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();
> 
> DefaultHttpClient client = new DefaultHttpClient(mgr);
> 
> HttpSolrServer server = new HttpSolrServer( "some url for our solr 
> index",client );
> 
> server.setRequestWriter(new BinaryRequestWriter());
> 
> 
> 
> 
> The thread method is the same, but uses HttpSolrServer instead of 
> CommonsHttpSolrServer.
> 
> We also, had an old solrconfig (not sure what version, but it is pre 3.x and 
> had mostly default values) that I had to replace with a 4.0 style 
> solrconfig.xml. I don't want to post the entire file (as it is large), but I 
> copied one from the solr 4.0 examples, and made a couple changes. First, I 
> wanted to turn off transaction logging. So essentially I have a line like 
> this (everything inside is commented out):
> 
> 
> 
> 
> 
> And I added a handler for javabin
> 
> 
>  class="solr.BinaryUpdateRequestHandler">
> 
>
> 
> application/javabin
> 
>   
> 
>  
> 
> I'm not sure what other configurations I should look at. I would think that 
> there should be a big obvious reason why the indexing performance would drop 
> nearly 10 fold.
> 
> Against our 3.5 instance I timed our index load, and it adds roughly 40,000 
> documents every 3-8 seconds.
> 
> Against our 4.0 instance it adds 40,000 documents every 70-75 seconds.
> 
> This isn't the end of the world, and I would love to use the new join feature 
> in solr 4.0. However, we have many different indexes with millions of 
> documents, and this kind of increase in load time is troubling.
> 
> 
> Thanks for your help.
> 
> 
> -Kevin
> 
> 
> The information in this email, including attachments, may be confidential and 
> is intended solely for the addressee(s). If you believe you received this 
> email by mistake, please notify the sender by return email as soon as 
> possible.

Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-23 Thread Viacheslav Davidovich

Hi, 

With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance as 
described in http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search 
and this solution perfectly filter the multiValued data defined in schema.xml  
like



 

the query looks like this with Solr 3.5:  q=*:*&fq={!geofilt}&sfield= 
location_data&pt=45.15,-93.85&d=50&sort=geodist() asc

As SOLR-2155 plugin not compatible with solr 4.0 I try to change the field 
definition to next:







But in this case after geofilt by location_data execution the correct values 
returns only if the field have 1 value, if more them 1 value stored in index 
required documents returns only when all the location points are matched.

Have anybody experience or any ideas how to receive the same behavior in 
solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?

Is this possible at all or I need to refactor the document structure and field 
definition to store only 1 location value per document?

WBR Viacheslav.

Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone

I am having some difficulty migrating our solr indexing scripts from using 3.5 
to solr 4.0. Notably, I am trying to track down why our performance in solr 4.0 
is about 5-10 times slower when indexing documents. Querying is still quite 
fast.

The code adds  documents in groups of 1000, and adds each group to the solr in 
a thread. The documents are somewhat large, including maybe 30-40 different 
field types, mostly multivalued. Here are some snippets of the code we used in 
3.5.


 MultiThreadedHttpConnectionManager mgr = new 
MultiThreadedHttpConnectionManager();

 HttpClient client = new HttpClient(mgr);

 CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for our 
index",client );

 server.setRequestWriter(new BinaryRequestWriter());


 Then, we delete the index, and proceed to generate documents and load the 
groups in a thread that looks kind of like this. I've omitted some overhead for 
handling exceptions, and retry attempts.


class DocWriterThread implements Runnable

{

CommonsHttpSolrServer server;

Collection docs;

private int commitWithin = 5; // 50 seconds

public DocWriterThread(CommonsHttpSolrServer 
server,Collection docs)

{

this.server=server;

this.docs=docs;

}

public void run()

{

// set the commitWithin feature

server.add(docs,commitWithin);

}

}


Now, I've had to change some things to get this compile with the Solr 4.0 
libraries. Here is what I tried to convert the above code to. I don't know if 
these are the correct equivalents, as I am not familiar with apache 
httpcomponents.



 ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager();

 DefaultHttpClient client = new DefaultHttpClient(mgr);

 HttpSolrServer server = new HttpSolrServer( "some url for our solr 
index",client );

 server.setRequestWriter(new BinaryRequestWriter());




The thread method is the same, but uses HttpSolrServer instead of 
CommonsHttpSolrServer.

We also, had an old solrconfig (not sure what version, but it is pre 3.x and 
had mostly default values) that I had to replace with a 4.0 style 
solrconfig.xml. I don't want to post the entire file (as it is large), but I 
copied one from the solr 4.0 examples, and made a couple changes. First, I 
wanted to turn off transaction logging. So essentially I have a line like this 
(everything inside is commented out):





And I added a handler for javabin






 application/javabin

   

  

I'm not sure what other configurations I should look at. I would think that 
there should be a big obvious reason why the indexing performance would drop 
nearly 10 fold.

Against our 3.5 instance I timed our index load, and it adds roughly 40,000 
documents every 3-8 seconds.

Against our 4.0 instance it adds 40,000 documents every 70-75 seconds.

This isn't the end of the world, and I would love to use the new join feature 
in solr 4.0. However, we have many different indexes with millions of 
documents, and this kind of increase in load time is troubling.


Thanks for your help.


-Kevin


The information in this email, including attachments, may be confidential and 
is intended solely for the addressee(s). If you believe you received this email 
by mistake, please notify the sender by return email as soon as possible.

Re: Delete all Documents in the Example (Solr 4.0)

2013-01-22 Thread O. Olson

Thank you Erick for that great tip on getting a listing of the Cores. 
O. O.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-all-Documents-in-the-Example-Solr-4-0-tp4035156p4035454.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delete all Documents in the Example (Solr 4.0)

2013-01-21 Thread Erick Erickson

Try the admin page (note, this doesn't need a core, .../solr should
take you there). The cores should be listed on the left

Best
Erick

On Mon, Jan 21, 2013 at 6:09 PM, O. Olson  wrote:
>
>
>
>
> - Messaggio originale -
> Da: Shawn Heisey 
> A: solr-user@lucene.apache.org
> Cc:
> Inviato: Lunedì 21 Gennaio 2013 12:35
> Oggetto: Re: Delete all Documents in the Example (Solr 4.0)
>
>>On 1/21/2013 11:27 AM, O. Olson wrote:
>>> http://localhost:8983/solr/update
>>>
>>> and I got a 404 too. I then looked at
>>> /example-DIH/solr/solr/conf/solrconfig.xml and it seems to have 
>>> >> name="/update" class="solr.UpdateRequestHandler"  />.
>>>
>>> I am confused why I am getting a 404 if /update has a
>>> handler?
>
>>You need to send the request to /solr/corename/update ... if you are using 
>>the solr example, most likely the core is named "collection1" so the URL 
>>would be /solr/collection1/update.
>>
>>There is a lot of information out there that has not been updated since 
>>before multicore operation became the default in Solr examples.
>>
>>The example does have defaultCoreName defined, but I still see lots of people 
>>that run into problems like this, so I suspect that it isn't always honored.
>>
>>Thanks,
>>Shawn
> ---
>
> Thank you Shawn for the hint. Can someone tell me how to
> figure out the corename?
>
> http://localhost:8983/solr/collection1/update
>
> did not seem to work for me. I then saw that /example/example-DIH/solr/db
> had a conf and data directory, so I assumed it to be core. I then tried
>
> http://localhost:8983/solr/db/update?stream.body=*:*
> http://localhost:8983/solr/db/update?stream.body=
>
> which worked for me i.e. the documents in the index got
> deleted.
>
> Thanks again,
> O. O.

Re: Delete all Documents in the Example (Solr 4.0)

2013-01-21 Thread O. Olson





- Messaggio originale -
Da: Shawn Heisey 
A: solr-user@lucene.apache.org
Cc: 
Inviato: Lunedì 21 Gennaio 2013 12:35
Oggetto: Re: Delete all Documents in the Example (Solr 4.0)

>On 1/21/2013 11:27 AM, O. Olson wrote:
>> http://localhost:8983/solr/update
>> 
>> and I got a 404 too. I then looked at
>> /example-DIH/solr/solr/conf/solrconfig.xml and it seems to have 
>> > name="/update" class="solr.UpdateRequestHandler"  />.
>> 
>> I am confused why I am getting a 404 if /update has a
>> handler?

>You need to send the request to /solr/corename/update ... if you are using the 
>solr example, most likely the core is named "collection1" so the URL would be 
>/solr/collection1/update.
>
>There is a lot of information out there that has not been updated since before 
>multicore operation became the default in Solr examples.
>
>The example does have defaultCoreName defined, but I still see lots of people 
>that run into problems like this, so I suspect that it isn't always honored.
>
>Thanks,
>Shawn
---

Thank you Shawn for the hint. Can someone tell me how to
figure out the corename?
 
http://localhost:8983/solr/collection1/update
 
did not seem to work for me. I then saw that /example/example-DIH/solr/db
had a conf and data directory, so I assumed it to be core. I then tried 
 
http://localhost:8983/solr/db/update?stream.body=*:*
http://localhost:8983/solr/db/update?stream.body=
 
which worked for me i.e. the documents in the index got
deleted.
 
Thanks again,
O. O.

Re: Delete all Documents in the Example (Solr 4.0)

2013-01-21 Thread Alexandre Rafalovitch

I just tested that and /update does not seem to honor the default core
value (same 404 issue). Is that a bug?

Regards,
   Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jan 21, 2013 at 1:35 PM, Shawn Heisey  wrote:

> On 1/21/2013 11:27 AM, O. Olson wrote:
>
>> http://localhost:8983/solr/**update 
>>
>> and I got a 404 too. I then looked at
>> /example-DIH/solr/solr/conf/**solrconfig.xml and it seems to have
>> > name="/update" class="solr.**UpdateRequestHandler"  />.
>>
>> I am confused why I am getting a 404 if /update has a
>> handler?
>>
>
> You need to send the request to /solr/corename/update ... if you are using
> the solr example, most likely the core is named "collection1" so the URL
> would be /solr/collection1/update.
>
> There is a lot of information out there that has not been updated since
> before multicore operation became the default in Solr examples.
>
> The example does have defaultCoreName defined, but I still see lots of
> people that run into problems like this, so I suspect that it isn't always
> honored.
>
> Thanks,
> Shawn
>
>

Re: Delete all Documents in the Example (Solr 4.0)

2013-01-21 Thread Shawn Heisey


On 1/21/2013 11:27 AM, O. Olson wrote:

http://localhost:8983/solr/update

and I got a 404 too. I then looked at
/example-DIH/solr/solr/conf/solrconfig.xml and it seems to have .

I am confused why I am getting a 404 if /update has a
handler?


You need to send the request to /solr/corename/update ... if you are 
using the solr example, most likely the core is named "collection1" so 
the URL would be /solr/collection1/update.


There is a lot of information out there that has not been updated since 
before multicore operation became the default in Solr examples.


The example does have defaultCoreName defined, but I still see lots of 
people that run into problems like this, so I suspect that it isn't 
always honored.


Thanks,
Shawn

Delete all Documents in the Example (Solr 4.0)

2013-01-21 Thread O. Olson

Hi,
 
    I am
attempting to use the example-DIH that comes with the Solr 4.0 download. In
/example, I start Solr using: 
 
java -Dsolr.solr.home="./example-DIH/solr/" -jar
start.jar
 
After playing with it for a while, I decided to delete all
documents in the index. The FAQ at 
http://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F 
seems to say that I needed to use: 
 
http://localhost:8983/solr/update?stream.body=*:*
http://localhost:8983/solr/update?stream.body=
 
I put the above urls in my browser, but I simply get 404’s. I
then tried: 
 
http://localhost:8983/solr/update 
 
and I got a 404 too. I then looked at
/example-DIH/solr/solr/conf/solrconfig.xml and it seems to have . 
 
I am confused why I am getting a 404 if /update has a
handler? 
 
Thank you for any ideas.
O. O.

Re: Solr 4.0 - timeAllowed in distributed search

2013-01-21 Thread Lyuba Romanchuk

Hi Michael,

Thank you very much for your reply!

Does it mean that when timeAllowed is used only search is interrupted and
document retrieval is not?

In order to check the total time of the query I run curl with linux time to
measure the total time including retrieving of documents. If I understood
your answer correctly I had to get a similar total time in both cases but
according to the results they are similar to QTime and to each other:

   - for non distributed: QTime=789 ms when total time is ~1 sec
   - for distributed: QTime=7.75 sec and total time is 7.9 sec.

Here is the output of the curls (direct_query.xml and distributed_query.xml
contain 30,000 documents in the reply):

Directly ask the shard:**

time curl '
http://localhost:8983/solr/shard_2013-01-07/select?q=*:*&rows=3&timeAllowed=500&partialResults=true&debugQuery=true
' >& direct_query.xml


real0m1.025s

user0m0.008s

sys 0m0.053s

from direct_query.xml:



true

0

789



3*:*

500

truetrue



Ask the shard through distributed search:


*time curl '
http://localhost:8983/solr/shard_2013-01-07/select?q=*:*&rows=3&shards=127.0.0.1%3A8983%2Fsolr%2Fshard_2013-01-07&timeAllowed=500&partialResults=true&shards.info=true&debug=true
' *>& distributed_query.xml



real0m7.905s

user0m0.010s

sys 0m0.052s


from distributed_query.xml:




true

0

7750



*:*

true

127.0.0.1:8983/solr/shard_2013-01-07

true

true

3

500



281930201.0895






Best regards,
Lyuba


On Sun, Jan 20, 2013 at 6:49 PM, Michael Ryan  wrote:

> (This is based on my knowledge of 3.6 - not sure if this has changed in
> 4.0)
>
> You are using rows=3, which requires retrieving 3 documents from
> disk. In a non-distributed search, the QTime will not include the time it
> takes to retrieve these documents, but in a distributed search, it will.
> For a *:* query, the document retrieval will almost always be the slowest
> part of the query. I'd suggest measuring how long it takes for the response
> to be returned, or use rows=0.
>
> The timeAllowed feature is very misleading. It only applies to a small
> portion of the query (which in my experience is usually not the part of the
> query that is actually slow). Do not depend on timeAllowed doing anything
> useful :)
>
> -Michael
>
> -Original Message-
> From: Lyuba Romanchuk [mailto:lyuba.romanc...@gmail.com]
> Sent: Sunday, January 20, 2013 6:36 AM
> To: solr-user@lucene.apache.org
> Subject: Solr 4.0 - timeAllowed in distributed search
>
> Hi,
>
> I try to use timeAllowed in query both in distributed search with one
> shard and directly to the same shard.
> I send the same query with timeAllowed=500 :
>
>- directly to the shard then QTime ~= 600 ms
>- through distributes search to the same shard QTime ~= 7 sec.
>
> I have two questions:
>
>- It seems that timeAllowed parameter doesn't work for distributes
>search, does it?
>- What may be the reason that causes the query to the shard through
>distributes search takes much more time than to the shard directly (the
>same distribution remains without timeAllowed parameter in the query)?
>
>
> Test results:
>
> Ask one shard through distributed search:
>
>
>
> http://localhost:8983/solr/shard_2013-01-07/select?q=*:*&rows=3&shards=127.0.0.1%3A8983%2Fsolr%2Fshard_2013-01-07&timeAllowed=500&partialResults=true&shards.info=true&debugQuery=true
> 
> 
> true
> 0
> 7307
> 
> *:*
> 127.0.0.1:8983/solr/shard_2013-01-07
> true
> true
> true
> 3
> 500
> 
> 
> 29574223
> 1.0
> 646
>  ...
> 30,000 docs
> ...
> 
> *:*
> *:*
> MatchAllDocsQuery(*:*)
> *:*
> LuceneQParser
> 6141.0  name="prepare">0.0  name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
> 6141.0  name="org.apache.solr.handler.component.QueryComponent"> name="time">6022.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">119.0<
>
> Ask the same shard directly:
>
>
> http://localhost:8983/solr/shard_2013-01-07/select?q=*:*&rows=3&timeAllowed=500&partialResults=true&shards.info=true&debugQuery=true
> 
> true
> 0
> 617
> 
> *:*
> true
> true
> true
> 3
> 500
>  ...
> 30,000 docs
> *:* name="querystring">*:* name="parsedquery">MatchAllDocsQuery(*:*) name="parsedquery_toString">*:*
> LuceneQParser
> 617.0  name="prepare">0.0  name="org.apache.solr.handler.component.QueryComponent"> name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
> 617.0  name="org.apache.solr.handler.component.QueryComponent"> name="time">516.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">0.0
>  name="time">101.0
>
> Thank you.
> Best regards,
> Lyuba
>

1 2 3 4 5 6 >

1 - 100 of 594 matches

Mail list logo