date:20120529

That issue is marked as a duplicate of SOLR-3134, which has a patch for Solr 
3.5.


https://issues.apache.org/jira/browse/SOLR-3134

-- Jack Krupansky

-Original Message- 
From: Ramprakash Ramamoorthy

Sent: Tuesday, May 29, 2012 3:03 AM
To: solr-user@lucene.apache.org
Subject: Solr - 1143

Dear all,

 A small doubt. I realised I will have to apply the patch
mentioned in Solr Jira 1143 to return partial results when one of my shards
is dead/slow.

 But the patch has no version explicitly specified. I am using
Solr 3.5.0 and can I apply the patch to my installation as such?

--
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420

Re: suggestions developing a multi-version concurrency control (MVCC) mechanism

2012-05-29 Thread Nicholas Ball


Hmmm interesting, that will definitely work and may be the way to go.
Ideally, I'd rather store the older versions within a field of the newest
if possible.
Can one create a custom field that holds other objects?

Nick

On Mon, 28 May 2012 17:07:06 -0700, Lance Norskog goks...@gmail.com
wrote:
 You can use the document id and timestamp as a compound unique id.
 Then the search would also sort by id, then by timestamp. Result
 grouping might let you pick the most recent document from each of the
 sorted docs.
 
 On Mon, May 28, 2012 at 3:15 PM, Nicholas Ball
 nicholas.b...@nodelay.com wrote:

 Hello all,

 For the first step of the distributed snapshot isolation system I'm
 developing for Solr, I'm going to need to have a MVCC mechanism as
 opposed
 to the single-version concurrency control mechanism already developed
 (DistributedUpdateProcessor class). I'm trying to find the very best
way
 to
 develop this into Solr 4.x (trunk) and so any help would be greatly
 appreciated!

 Essentially I need to be able to store multiple version of a document
so
 that when you look up a document with a given timestamp, you're given
the
 correct version (anything the same or older, not fresher). The older
 versioned documents need to be stored in the index itself to ensure
they
 are durable and can be manipulated as other Solr data can be.

 One way to do this is to store the old versioned Solr documents within
 the
 latest Solr Document, but I'm not sure this is even possible?
 Alternatively, I could have the latest versioned Document store the
 unique
 keys which point to other older documents. The problem with this is
that
 it
 complicates things having various partial objects which all combine as
 one
 logically document.

 Are there any suggestions as to the best way to develop this feature?

 Thank you in advance for any help you can spare!

 Nicholas

Re: Is optimize needed on slaves if it replicates from optimized master?

2012-05-29 Thread Walter Underwood

You do not need to use optimize at all.

Solr continually merges segments (optimizes) as needed.

wunder

On May 29, 2012, at 6:08 AM, sudarshan wrote:

 Hi Walter,
 Thank you. Do you mean that optimize need not be used at all?
 If Solr merges segments (when needed as you said), is there a criteria
 during which Solr does this automatically. If I want the search to be faster
 and Solr does not optimize for quite a long time, would it not compromise my
 query processing rate?
 
 To All,
 I have another doubt. If I optimize and replicate, for the
 first time it would transfer all the segments from the master to slave
 irrespective of the modified segment(s). After first replication, how the
 transfer would be made  - again all segments are replicated or only the
 modified segments are replicated? I believe after the first replication
 (master and slave in sync), only the modified segments would be transferred
 just like the  non-optimized index transfer. Am I right? 
 
 Regards,
 Sudarshan  
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-optimize-needed-on-slaves-if-it-replicates-from-optimized-master-tp3241604p3986597.html
 Sent from the Solr - User mailing list archive at Nabble.com.

sort in local params and rows parameter

2012-05-29 Thread jhusman

Hello,

we're having some issues with a Solr query and are unsure if we've
encountered a bug or just don't understand the expected behaviour. Any help
would be appreciated.

The problem is this: we're running a query using the browser that for
debugging purposes looks like this:
q={!sort%3DeventId%20asc}arows=2

here eventId is a long field in our schema. The sort works fine, but the
query returns 10 results (out of 35), clearly ignoring the rows parameter.
For reference, q=arows=2 only returns 2 results (again out of 35).

We can go around this by introducing rows as a local parameter instead:
q={!sort%3DeventId%20asc+rows%3D2}a

this only returns 2 results, as expected.
So, it seems that using sort as a local parameter causes solr to ignore the
external rows parameter. This does not seem to be true for any local
parameters, only sort (as far as we can tell).

Why is this happening?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-in-local-params-and-rows-parameter-tp3986615.html
Sent from the Solr - User mailing list archive at Nabble.com.

TF-IDF vector

2012-05-29 Thread Allen

Hi List,

I am curious about the meaning of tf-idf vector after reading this
http://wiki.apache.org/solr/TermVectorComponent.

The tf flag returns me the tf vector for just one doc. The df flag
returns me the df vector of all the docs in the index.

Does the tf-idf vector represents one doc or set of docs?

Too, can I specify a subset of docs which the df vector is calculated
on rather than the entire set of docs?

Re: TF-IDF vector


Does the tf-idf vector represents one doc or set of docs?

IDF is calculated across all docs that contain the term.

TF is calculated for a single document containing the term.

Each term of each doc will have its own tf-idf.

-- Jack Krupansky

-Original Message- 
From: Allen 
Sent: Tuesday, May 29, 2012 12:11 PM 
To: solr-user@lucene.apache.org 
Subject: TF-IDF vector 


Hi List,

I am curious about the meaning of tf-idf vector after reading this
http://wiki.apache.org/solr/TermVectorComponent.

The tf flag returns me the tf vector for just one doc. The df flag
returns me the df vector of all the docs in the index.

Does the tf-idf vector represents one doc or set of docs?

Too, can I specify a subset of docs which the df vector is calculated
on rather than the entire set of docs?

RE: Many Cores with Solr

2012-05-29 Thread Klostermeyer, Michael

IMO it would be a better (from Solr's perspective) to handle the security w/ 
the application code.  Each query could include a ?fq=userID:12345... which 
would limit results to only what that user is allowed to see.

Mike

-Original Message-
From: Mike Douglass [mailto:mikeadougl...@gmail.com] 
Sent: Wednesday, May 23, 2012 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Many Cores with Solr

My interest in this is the desire to create one index per user of a system - 
the issue here is privacy - data indexed for one user should not be visible to 
other users.

For this purpose solr will be hidden behind a proxy which steers authenticated 
sessions to the appropriat ecore.

Does this seem like a valid/feasible approach?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3985789.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Many Cores with Solr

That's what we do. It has the advantage of letting the general queries
be cached once across all users.

Michael

On Tue, May 29, 2012 at 12:39 PM, Klostermeyer, Michael
mklosterme...@riskexchange.com wrote:
 IMO it would be a better (from Solr's perspective) to handle the security w/ 
 the application code.  Each query could include a ?fq=userID:12345... which 
 would limit results to only what that user is allowed to see.

 Mike


-- 
Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com

Re: UpdateRequestProcessor : flattened values

2012-05-29 Thread Chris Hostetter


: And it might make sense to have a multi-value flattening attribute for Solr
: itself rather than in SolrCell.

Coming in 4.0...

https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html

DOC
Concatenates multiple values for fields matching the specified conditions 
using a configurable delimiter which defaults to , .

By default, this processor concatenates the values for any field name 
which according to the schema is multiValued=false and uses TextField or 
StrField
DOC



-Hoss

Solr backup / replication internals

2012-05-29 Thread Ganesh

Hi,

Could any one explain me about the internals of Backup / Replication. Please 
give me more information like do's and don'ts of Backup / Replication.

1. Is the backup / replication incremental ? 

2. While taking backup / replication, Whether Solr could add / update the index?

3. Backup command and Backup script does file copy. Is there any difference 
between them. 

Regards
Ganesh

Relevancy ranking for synonym matches

2012-05-29 Thread Gau

I was wondering if there is any solution for this.
Currently I expand my results to match the synonyms at query time.

So if I entered James, I would get results for Jim, Gomes, Game etc as they
would be expanded by matching the synonyms for James. But then since this is
just a one word match, tf, idf and other parameters dont make sense. I have
reset those factors to 1. Hence the results I get have an equal score.

What I really want to do is, sort these results by Levenstein Distance
without using ~ sign. The issue in using ~ sign is, if I have a synonym
which is radically different (say Greg for James), if I use James~0, Greg
would not even match closely with James and the number of results returned
would be less than the actual number of synonym matches.

So my usecase is, without reducing the number of results, I want to sort
them by Levenstein Distance, or closest string match to the original query

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Relevancy-ranking-for-synonym-matches-tp3986634.html
Sent from the Solr - User mailing list archive at Nabble.com.

MongoDB and Solr

Hi

I am building web app/mobile app, where users can update information
frequently and there is a search function to quick search the information
using different types of searches. 

Most of the data is going to be posted in JSON Format and stored in JSON
format

I have a few questions on the architecture choices, I am relatively new to
Solr and MongoDB.  

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document. 


Thanks
Rajesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986636.html
Sent from the Solr - User mailing list archive at Nabble.com.

MongoDB and Solr

Hi

I am building web app/mobile app, where users can update information
frequently and there is a search function to quick search the information
using different types of searches. 

Most of the data is going to be posted in JSON Format and stored in JSON
format

I have a few questions on the architecture choices, I am relatively new to
Solr and MongoDB.  

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document. 


Thanks
Rajesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UpdateRequestProcessor : flattened values

Sounds good. Then all that will be needed is a way to disable the SolrCell 
flattening so that other update processors can see the unflattened field 
values before they are handled off to a ConcatFieldUpdateProcessor them.


-- Jack Krupansky

-Original Message- 
From: Chris Hostetter

Sent: Tuesday, May 29, 2012 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: UpdateRequestProcessor : flattened values


: And it might make sense to have a multi-value flattening attribute for 
Solr

: itself rather than in SolrCell.

Coming in 4.0...

https://builds.apache.org/view/G-L/view/Lucene/job/Solr-trunk/javadoc/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html

DOC
Concatenates multiple values for fields matching the specified conditions
using a configurable delimiter which defaults to , .

By default, this processor concatenates the values for any field name
which according to the schema is multiValued=false and uses TextField or
StrField
DOC



-Hoss

Re: MongoDB and Solr

Although Solr uses XML format for document update and query, JSON is a 
supported option.


To post documents in JSON, see:
http://wiki.apache.org/solr/UpdateJSON

To retrieve query results in JSON, see:
http://wiki.apache.org/solr/SolJSON

That works well for relatively flat data (each field has a simple value or 
list of values), but less well if you have complex structure within an 
individual field value (e.g., multi-level nesting of JSON for a single field 
value.) For the latter, you would have to store the JSON as a string for 
such a field.


-- Jack Krupansky

-Original Message- 
From: rjain15

Sent: Tuesday, May 29, 2012 12:57 PM
To: solr-user@lucene.apache.org
Subject: MongoDB and Solr

Hi

I am building web app/mobile app, where users can update information
frequently and there is a search function to quick search the information
using different types of searches.

Most of the data is going to be posted in JSON Format and stored in JSON
format

I have a few questions on the architecture choices, I am relatively new to
Solr and MongoDB.

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document.


Thanks
Rajesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MongoDB and Solr

Hi Jack

Thanks for the information. I do have multi-level nesting of JSON data. 

So back to my questions, apologize for repeating...

1. Should I use MongoDB to store the JSON documents, or does Solr natively 
store the documents in the data directory 

2. Does Solr require a specific schema for the JSON document. 

Thanks
Rajesh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MongoDB and Solr

1. Yes, and 2. Yes. :)

Solr's adding more NoSQL-like features for 4.0, but in the meantime,
you're better off storing documents with a complex schema in a
document store and using Solr for findability. Basically the schema
for a document in Solr/Lucene is flat (although it can contain
arbitrarily-named fields), so your document will require some sort of
transformation for indexing.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, May 29, 2012 at 2:20 PM, rjain15 rjai...@gmail.com wrote:
 Hi Jack

 Thanks for the information. I do have multi-level nesting of JSON data.

 So back to my questions, apologize for repeating...

 1. Should I use MongoDB to store the JSON documents, or does Solr natively
 store the documents in the data directory

 2. Does Solr require a specific schema for the JSON document.

 Thanks
 Rajesh


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: MongoDB and Solr

Could you give us an example of one of your documents. Then we can give you 
better feedback on what makes sense within Solr.


-- Jack Krupansky

-Original Message- 
From: rjain15

Sent: Tuesday, May 29, 2012 2:20 PM
To: solr-user@lucene.apache.org
Subject: Re: MongoDB and Solr

Hi Jack

Thanks for the information. I do have multi-level nesting of JSON data.

So back to my questions, apologize for repeating...

1. Should I use MongoDB to store the JSON documents, or does Solr natively
store the documents in the data directory

2. Does Solr require a specific schema for the JSON document.

Thanks
Rajesh


--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986662.html
Sent from the Solr - User mailing list archive at Nabble.com.

Example setup of using Solr 3.6.0 with Jetty 7 (7.6.3)?

2012-05-29 Thread Aaron Daubman

Greetings,

Has anybody gotten Solr 3.6.0 to work well with Jetty 7.6.3, and if so,
would you mind sharing your config files / directory structure / other
useful details?

Thanks,
 Aaron

Re: Many Cores with Solr

2012-05-29 Thread Mike Douglass

Thank you.

That sounds good - are we sure to get no leakage with this approach?

I'd be indexing personal information which must not be delivered without
authentication.

The solr instance is front-ended by bedework which can handle the auth and
adding a query term.

 IMO it would be a better (from Solr's perspective) to handle the security
 w/ the application code.  Each query could include a ?fq=userID:12345...
 which would limit results to only what that user is allowed to see.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MongoDB and Solr

Hi 

This is a sample schema, but it can be more nested as I build the app. As
more students enroll, or more classes are added, it will grow. 





colleges
[
college:
{
id : college Id
classes:
[
{
id: 0001,
type: speech,
name: Speech Class,
credits: 3,
students:
{

{ id: 1001, name: ABC, },

{ id: 1002, name: PQQ,... },

{ id: 1003, name: AAA,... },

{ id: 1004, name: ASA,... }
},
instructors:
[
{ id: 5001, 
name: ASAS },
{ id: 5002, 
name: ASAA },
]
},
]   
locations:
[
{ id: 6001, address: Address-1 
},
{ id: 6001, address: Address-2 
},
]
}
]   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MongoDB-and-Solr-tp3986637p3986676.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MongoDB and Solr

2012-05-29 Thread Gora Mohanty

On 29 May 2012 22:27, rjain15 rjai...@gmail.com wrote:
 Hi

 I am building web app/mobile app, where users can update information
 frequently and there is a search function to quick search the information
 using different types of searches.

 Most of the data is going to be posted in JSON Format and stored in JSON
 format

 I have a few questions on the architecture choices, I am relatively new to
 Solr and MongoDB.

 1. Should I use MongoDB to store the JSON documents, or does Solr natively
 store the documents in the data directory

Sorry, but you do not provide nearly enough information
for people to be able to make sensible suggestions. What
is your use case? MongoDB is largely a different beast from
Solr. What do you think merits its use, and where does it
fit in your scheme of things? In many cases, one could have
both MongoDB, and Solr. In other cases, one or the other
might better fit the bill.

 2. Does Solr require a specific schema for the JSON document.

You can POST a JSON document to Solr, and get
JSON output back. Not sure if this meets your needs,
but please take a look at:
http://wiki.apache.org/solr/UpdateJSON
http://wiki.apache.org/solr/SolJSON

Regards,
Gora

Re: Many Cores with Solr

It's a similar approach as using SQL to filter the rows brought back
for a particular user from a table. It's strong as long as you write
your queries correctly, you store your data properly, and you guard
against injection and privilege escalation. There's an added bonus in
this case in that the user's submitted text isn't in the same query as
the part that limits the rows they have access to, but if you're doing
proper escaping of the query text, that shouldn't be relied on anyway.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, May 29, 2012 at 3:07 PM, Mike Douglass mikeadougl...@gmail.com wrote:
 Thank you.

 That sounds good - are we sure to get no leakage with this approach?

 I'd be indexing personal information which must not be delivered without
 authentication.

 The solr instance is front-ended by bedework which can handle the auth and
 adding a query term.

Re: Many Cores with Solr

2012-05-29 Thread Erik Hatcher

You do get relevancy related leakage though.  With users content all in the 
same index and using the same field names, term and document frequencies across 
the index will be used for scoring.  This may be (and has been) a good reason 
to keep separately searchable content in different indexes/cores.

Erik


On May 29, 2012, at 15:07 , Mike Douglass wrote:

 Thank you.
 
 That sounds good - are we sure to get no leakage with this approach?
 
 I'd be indexing personal information which must not be delivered without
 authentication.
 
 The solr instance is front-ended by bedework which can handle the auth and
 adding a query term.
 
 IMO it would be a better (from Solr's perspective) to handle the security
 w/ the application code.  Each query could include a ?fq=userID:12345...
 which would limit results to only what that user is allowed to see.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Many Cores with Solr

In our particular case, we're using this index to do prefix searches
for autocomplete of sparse keyword data, so we don't have much to
worry about on this front, but I do agree that it's a consideration
for those use cases that do reveal information via ranking.

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Tue, May 29, 2012 at 4:00 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 You do get relevancy related leakage though.  With users content all in the 
 same index and using the same field names, term and document frequencies 
 across the index will be used for scoring.  This may be (and has been) a good 
 reason to keep separately searchable content in different indexes/cores.

        Erik


 On May 29, 2012, at 15:07 , Mike Douglass wrote:

 Thank you.

 That sounds good - are we sure to get no leakage with this approach?

 I'd be indexing personal information which must not be delivered without
 authentication.

 The solr instance is front-ended by bedework which can handle the auth and
 adding a query term.

 IMO it would be a better (from Solr's perspective) to handle the security
 w/ the application code.  Each query could include a ?fq=userID:12345...
 which would limit results to only what that user is allowed to see.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986675.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: suggestions developing a multi-version concurrency control (MVCC) mechanism

2012-05-29 Thread Lance Norskog

Solr uses a flat schema. You can store old versions, but you have to
encode them somehow and save them as data.

On Tue, May 29, 2012 at 7:20 AM, Nicholas Ball
nicholas.b...@nodelay.com wrote:

 Hmmm interesting, that will definitely work and may be the way to go.
 Ideally, I'd rather store the older versions within a field of the newest
 if possible.
 Can one create a custom field that holds other objects?

 Nick

 On Mon, 28 May 2012 17:07:06 -0700, Lance Norskog goks...@gmail.com
 wrote:
 You can use the document id and timestamp as a compound unique id.
 Then the search would also sort by id, then by timestamp. Result
 grouping might let you pick the most recent document from each of the
 sorted docs.

 On Mon, May 28, 2012 at 3:15 PM, Nicholas Ball
 nicholas.b...@nodelay.com wrote:

 Hello all,

 For the first step of the distributed snapshot isolation system I'm
 developing for Solr, I'm going to need to have a MVCC mechanism as
 opposed
 to the single-version concurrency control mechanism already developed
 (DistributedUpdateProcessor class). I'm trying to find the very best
 way
 to
 develop this into Solr 4.x (trunk) and so any help would be greatly
 appreciated!

 Essentially I need to be able to store multiple version of a document
 so
 that when you look up a document with a given timestamp, you're given
 the
 correct version (anything the same or older, not fresher). The older
 versioned documents need to be stored in the index itself to ensure
 they
 are durable and can be manipulated as other Solr data can be.

 One way to do this is to store the old versioned Solr documents within
 the
 latest Solr Document, but I'm not sure this is even possible?
 Alternatively, I could have the latest versioned Document store the
 unique
 keys which point to other older documents. The problem with this is
 that
 it
 complicates things having various partial objects which all combine as
 one
 logically document.

 Are there any suggestions as to the best way to develop this feature?

 Thank you in advance for any help you can spare!

 Nicholas



-- 
Lance Norskog
goks...@gmail.com

Re: Many Cores with Solr

2012-05-29 Thread Mike Douglass

That was one of my concerns. To date I've been using lucene directly and
pointing it at an index for the current authenticated user. solr cores
seemed to come close to that.

Is the issue with a lot of cores just creating a lot or using many cores
concurrently? 


Erik Hatcher-4 wrote
 
 You do get relevancy related leakage though.  With users content all in
 the same index and using the same field names, term and document
 frequencies across the index will be used for scoring.  This may be (and
 has been) a good reason to keep separately searchable content in different
 indexes/cores.
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Many-Cores-with-Solr-tp3161889p3986710.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-words synonyms matching

2012-05-29 Thread elisabeth benoit

Hello Bernd,

Thanks a lot for your answer. I'll work on this.

Best regards,
Elisabeth

2012/5/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de

 Hello Elisabeth,

 my synonyms.txt is like your 2nd example:

 naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd,
 foresta\ naturale, natuurbos, natural\ forest, bosque\ natural,
 természetes\ erdő,
 natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural,
 naturskov,
 forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală,
 las\ naturalny, natürlicher\ wald


 An example from my system with debugging turned on and searching for
 naturwald:

 lst name=debug
  str name=rawquerystringnaturwald/str
  str name=querystringnaturwald/str
  str name=parsedquerytextth:naturwald textth:φυσικό δάσος
 textth:естествена гора
 textth:prírodný les textth:naravni gozd textth:foresta naturale
 textth:natuurbos
 textth:natural forest textth:bosque natural textth:természetes erdő
 textth:natūralus miškas textth:prirodna šuma textth:dabiskais mežs
 textth:floresta natural textth:naturskov textth:forêt naturelle
 textth:naturskog
 textth:přírodní les textth:luonnonmetsä textth:pădure naturală
 textth:las naturalny
 textth:natürlicher wald/str
 ...

 As you can see my search for naturwald extends to single and multiword
 synonyms e.g. forêt naturelle


 My SynonymFilterFactory has the following settings:

 org.apache.solr.analysis.SynonymFilterFactory
 {tokenizerFactory=solr.KeywordTokenizerFactory,
 synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr,
 ignoreCase=true,
 luceneMatchVersion=LUCENE_36}

 But as I already mentioned, there is much more work to be done to get it
 running than
 just using SynonymFilterFactory.

 Regards
 Bernd



 Am 23.05.2012 08:49, schrieb elisabeth benoit:
  Hello Bernd,
 
  Thanks for your advice.
 
  I have one question: how did you manage to map one word to a multiwords
  synonym???
 
  I've tried (in synonyms.txt)
 
  mairie, hotel de ville
 
  mairie, hotel\ de\ ville
 
  mairie = mairie, hotel de ville
 
  mairie = mairie, hotel\ de\ ville
 
  but nothing prevents mairie from matching with hotel...
 
  The only way I found is to use
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
 declaration
  in schema.xml, but then since mairie is not alone in my index field, it
  doesn't match.
 
 
  best regards,
  Elisabeth
 
 
 
 
  the only way I found, I schema.xml, is to use
 
 
 
  2012/5/15 Bernd Fehling bernd.fehl...@uni-bielefeld.de
 
  Without reading the whole thread let me say that you should not trust
  the solr admin analysis. It takes the whole multiword search and runs
  it all together at once through each analyzer step (factory).
  But this is not how the real system works. First pitfall, the query
 parser
  is also splitting at white space (if not a phrase query). Due to this,
  a multiword query is send chunk after chunk through the analyzer and,
  second pitfall, each chunk runs through the whole analyzer by its own.
 
  So if you are dealing with multiword synonyms you have the following
  problems. Either you turn your query into a phrase so that the whole
  phrase is analyzed at once and therefore looked up as multiword synonym
  but phrase queries are not analyzed !!! OR you send your query chunk
  by chunk through the analyzer but then they are not multiwords anymore
  and are not found in your synonyms.txt.
 
  From my experience I can say that it requires some deep work to get it
 done
  but it is possible. I have connected a thesaurus to solr which is doing
  query time expansion (no need to reindex if the thesaurus changes).
  The thesaurus holds synonyms and used for terms in 24 languages. So
  it is also some kind of language translation. And naturally the
 thesaurus
  translates from single term to multi term synonyms and vice versa.
 
  Regards,
  Bernd
 
 
  Am 14.05.2012 13:54, schrieb elisabeth benoit:
  Just for the record, I'd like to conclude this thread
 
  First, you were right, there was no behaviour difference between fq
 and q
  parameters.
 
  I realized that:
 
  1) my synonym (hotel de ville) has a stopword in it (de) and since I
 used
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
  declaration,
  there was no stopword removal in the indewed expression, so when
  requesting
  hotel de ville, after stopwords removal in query, Solr was comparing
  hotel de ville
  with hotel ville
 
  but my queries never even got to that point since
 
  2) I made a mistake using mairie alone in the admin interface when
  testing my schema. The real field was something like collectivités
  territoriales mairie,
  so the synonym hotel de ville was not even applied, because of the
  tokenizerFactory=solr.KeywordTokenizerFactory in my synonym
 definition
  not splitting field into words when parsing
 
  So my problem is not solved, and I'm considering solving it outside of
  Solr
  scope, unless someone else has a clue

Re: Multi-words synonyms matching

2012-05-29 Thread Lance Norskog

I recently have had the same use case. I wound up doing this: in both
index and query time, the synonyms file is 'expand=false'. All
multi-word synonyms map to one single-word synonym (per group). This
way, only the main word is indexed or queried.

If the synonym file changes, you have to re-index the matching content.

On Tue, May 29, 2012 at 1:27 PM, elisabeth benoit
elisaelisael...@gmail.com wrote:
Hello Bernd,

Thanks a lot for your answer. I'll work on this.

Best regards,
Elisabeth

2012/5/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de

Hello Elisabeth,

my synonyms.txt is like your 2nd example:

naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd,
foresta\ naturale, natuurbos, natural\ forest, bosque\ natural,
természetes\ erdő,
natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural,
naturskov,
forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală,
las\ naturalny, natürlicher\ wald

An example from my system with debugging turned on and searching for
naturwald:

lst name=debug
str name=rawquerystringnaturwald/str
str name=querystringnaturwald/str
str name=parsedquerytextth:naturwald textth:φυσικό δάσος
textth:естествена гора
textth:prírodný les textth:naravni gozd textth:foresta naturale
textth:natuurbos
textth:natural forest textth:bosque natural textth:természetes erdő
textth:natūralus miškas textth:prirodna šuma textth:dabiskais mežs
textth:floresta natural textth:naturskov textth:forêt naturelle
textth:naturskog
textth:přírodní les textth:luonnonmetsä textth:pădure naturală
textth:las naturalny
textth:natürlicher wald/str
...

As you can see my search for naturwald extends to single and multiword
synonyms e.g. forêt naturelle

My SynonymFilterFactory has the following settings:

org.apache.solr.analysis.SynonymFilterFactory
{tokenizerFactory=solr.KeywordTokenizerFactory,
synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr,
ignoreCase=true,
luceneMatchVersion=LUCENE_36}

But as I already mentioned, there is much more work to be done to get it
running than
just using SynonymFilterFactory.

Regards
Bernd

Am 23.05.2012 08:49, schrieb elisabeth benoit:
Hello Bernd,

Thanks for your advice.

I have one question: how did you manage to map one word to a multiwords
synonym???

I've tried (in synonyms.txt)

mairie, hotel de ville

mairie, hotel\ de\ ville

mairie = mairie, hotel de ville

mairie = mairie, hotel\ de\ ville

but nothing prevents mairie from matching with hotel...

The only way I found is to use
tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
declaration
in schema.xml, but then since mairie is not alone in my index field, it
doesn't match.

best regards,
Elisabeth

the only way I found, I schema.xml, is to use

2012/5/15 Bernd Fehling bernd.fehl...@uni-bielefeld.de

Without reading the whole thread let me say that you should not trust
the solr admin analysis. It takes the whole multiword search and runs
it all together at once through each analyzer step (factory).
But this is not how the real system works. First pitfall, the query
parser
is also splitting at white space (if not a phrase query). Due to this,
a multiword query is send chunk after chunk through the analyzer and,
second pitfall, each chunk runs through the whole analyzer by its own.

So if you are dealing with multiword synonyms you have the following
problems. Either you turn your query into a phrase so that the whole
phrase is analyzed at once and therefore looked up as multiword synonym
but phrase queries are not analyzed !!! OR you send your query chunk
by chunk through the analyzer but then they are not multiwords anymore
and are not found in your synonyms.txt.

From my experience I can say that it requires some deep work to get it
done
but it is possible. I have connected a thesaurus to solr which is doing
query time expansion (no need to reindex if the thesaurus changes).
The thesaurus holds synonyms and used for terms in 24 languages. So
it is also some kind of language translation. And naturally the
thesaurus
translates from single term to multi term synonyms and vice versa.

Regards,
Bernd

Am 14.05.2012 13:54, schrieb elisabeth benoit:
Just for the record, I'd like to conclude this thread

First, you were right, there was no behaviour difference between fq
and q
parameters.

I realized that:

1) my synonym (hotel de ville) has a stopword in it (de) and since I
used
tokenizerFactory=solr.KeywordTokenizerFactory in my synonyms
declaration,
there was no stopword removal in the indewed expression, so when
requesting
hotel de ville, after stopwords removal in query, Solr was comparing
hotel de ville
with hotel ville

but my queries never even got to that point since

2) I made a mistake using mairie alone in the admin interface when

Re: how to reduce the result size to 2-3 lines and expand based on user interest

2012-05-29 Thread srini

hi iorixxx,

Sorry I missed your reply. Let me put my requirement in another way.

I have a description field which holds more text(2-3 para graphs) and it is
indexed.

When User search for any word, if solr finds that word in description I want
to show the content probably 2-3 lines which matches the search word? Any
ideas how to do this?

Thanks in Advance!!!
Srini


--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-reduce-the-result-size-to-2-3-lines-and-expand-based-on-user-interest-tp3985692p3986727.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MongoDB and Solr