Re: How do I this in Solr?

2010-11-08 Thread Varun Gupta
I haven't been able to work on it because of some other commitments. The
MemoryIndex approach seems promising. Only thing I will have to check is the
memory requirement as I have close to 2 million documents.

Will let you know if I can make it work.

Thanks a lot!

--
Varun Gupta

On Sat, Nov 6, 2010 at 3:48 AM, Steven A Rowe sar...@syr.edu wrote:

 Hi Varun,

 On 10/26/2010 at 11:26 PM, Varun Gupta wrote:
  I will try to implement the two filters suggested by Steven and see how
  the performance matches up.

 Have you made any progress?

 I was thinking about your use case, and it occurred to me that you could
 get what you want by reversing the problem, using Lucene's MemoryIndex 
 http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html.
  (As far as I can tell, this functionality -- i.e. standing queries a.k.a.
 routing a.k.a. filtering -- is not present in Solr.)

 You can load your query (as a document) into a MemoryIndex, and then use
 each of your documents to query against it, something like (untested!):

MapString,Query documents = new HashMapString,Query();
Analyzer analyzer = new WhitespaceAnalyzer();
QueryParser parser = new QueryParser(content, analyzer);
parser.setDefaultOperator(QueryParser.Operator.AND);
documents.put(ID001, parser.parse(nokia n95));
documents.put(ID002, parser.parse(GPS));
documents.put(ID003, parser.parse(android));
documents.put(ID004, parser.parse(samsung));
  documents.put(ID005, parser.parse(samsung android));
  documents.put(ID006, parser.parse(nokia android));
  documents.put(ID007, parser.parse(mobile with GPS));

MemoryIndex index = new MemoryIndex();
index.addField(content, samsung with GPS, analyzer);

for (Map.EntryString,Query entry : documents.entrySet()) {
  Query query = entry.getValue();
  if (index.search(query)  0.0f) {
String docId = entry.getKey();
// Do something with the hits here ...
  }
}

 In the above example, the documents samsung, GPS, android and
 samsung android would be hits, and the other documents would not be, just
 as you wanted.

 MemoryIndex is designed to be very fast for this kind of usage, so even
 100's of thousands of documents should be feasible.

 Steve

  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 11:26 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in Solr?
 
  Thanks everybody for the inputs.
 
  Looks like Steven's solution is the closest one but will lead to
  performance
  issues when the query string has many terms.
 
  I will try to implement the two filters suggested by Steven and see how
  the
  performance matches up.
 
  --
  Thanks
  Varun Gupta
 
 
  On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹)
  scott@udngroup.comwrote:
 
   I think you have to write a yet exact match handler yourself (I mean
  yet
   cause it's not quite exact match we normally know). Steve's answer is
  quite
   near your request. You can do further work based on his solution.
  
   At the last step, I'll suggest you eat up all blank within query string
  and
   query result, respevtively  only returns those results that has equal
   string length as the query string's.
  
   For example, giving:
   *query string = Samsung with GPS
   *query results:
   resutl 1 = Samsung has lots of mobile with GPS
   result 2 = with GPS Samsng
   result 3 = GPS mobile with vendors, such as Sony, Samsung
  
   they become:
   *query result = SamsungwithGPS (length =14)
   *query results:
   resutl 1 = SamsunghaslotsofmobilewithGPS (length =29)
   result 2 = withGPSSamsng (length =14)
   result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43)
  
   so result 2 matches your request.
  
   In this way, you can avoid case-sensitive, word-order-rearrange load of
   works. Furthermore, you can do refined work, such as remove white
   characters, etc.
  
   Scott @ Taiwan
  
  
   - Original Message - From: Varun Gupta
  varun.vgu...@gmail.com
  
   To: solr-user@lucene.apache.org
   Sent: Tuesday, October 26, 2010 9:07 PM
  
   Subject: How do I this in Solr?
  
  
Hi,
  
   I have lot of small documents (each containing 1 to 15 words) indexed
  in
   Solr. For the search query, I want the search results to contain only
   those
   documents that satisfy this criteria All of the words of the search
   result
   document are present in the search query
  
   For example:
   If I have the following documents indexed: nokia n95, GPS,
  android,
   samsung, samsung andriod, nokia andriod, mobile with GPS
  
   If I search with the text samsung andriod GPS, search results should
   only
   conain samsung, GPS, andriod and samsung andriod.
  
   Is there a way to do this in Solr.
  
   --
   Thanks
   Varun Gupta

RE: How do I this in Solr?

2010-11-05 Thread Steven A Rowe
Hi Varun,

On 10/26/2010 at 11:26 PM, Varun Gupta wrote:
 I will try to implement the two filters suggested by Steven and see how
 the performance matches up.

Have you made any progress?

I was thinking about your use case, and it occurred to me that you could get 
what you want by reversing the problem, using Lucene's MemoryIndex 
http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html.
  (As far as I can tell, this functionality -- i.e. standing queries a.k.a. 
routing a.k.a. filtering -- is not present in Solr.)

You can load your query (as a document) into a MemoryIndex, and then use each 
of your documents to query against it, something like (untested!):

MapString,Query documents = new HashMapString,Query();
Analyzer analyzer = new WhitespaceAnalyzer();
QueryParser parser = new QueryParser(content, analyzer);
parser.setDefaultOperator(QueryParser.Operator.AND);
documents.put(ID001, parser.parse(nokia n95));
documents.put(ID002, parser.parse(GPS));
documents.put(ID003, parser.parse(android));
documents.put(ID004, parser.parse(samsung));
  documents.put(ID005, parser.parse(samsung android));
  documents.put(ID006, parser.parse(nokia android));
  documents.put(ID007, parser.parse(mobile with GPS));

MemoryIndex index = new MemoryIndex();
index.addField(content, samsung with GPS, analyzer);

for (Map.EntryString,Query entry : documents.entrySet()) {
  Query query = entry.getValue();
  if (index.search(query)  0.0f) {
String docId = entry.getKey();
// Do something with the hits here ...
  }
}

In the above example, the documents samsung, GPS, android and samsung 
android would be hits, and the other documents would not be, just as you 
wanted.

MemoryIndex is designed to be very fast for this kind of usage, so even 100's 
of thousands of documents should be feasible.

Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010 11:26 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?
 
 Thanks everybody for the inputs.
 
 Looks like Steven's solution is the closest one but will lead to
 performance
 issues when the query string has many terms.
 
 I will try to implement the two filters suggested by Steven and see how
 the
 performance matches up.
 
 --
 Thanks
 Varun Gupta
 
 
 On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹)
 scott@udngroup.comwrote:
 
  I think you have to write a yet exact match handler yourself (I mean
 yet
  cause it's not quite exact match we normally know). Steve's answer is
 quite
  near your request. You can do further work based on his solution.
 
  At the last step, I'll suggest you eat up all blank within query string
 and
  query result, respevtively  only returns those results that has equal
  string length as the query string's.
 
  For example, giving:
  *query string = Samsung with GPS
  *query results:
  resutl 1 = Samsung has lots of mobile with GPS
  result 2 = with GPS Samsng
  result 3 = GPS mobile with vendors, such as Sony, Samsung
 
  they become:
  *query result = SamsungwithGPS (length =14)
  *query results:
  resutl 1 = SamsunghaslotsofmobilewithGPS (length =29)
  result 2 = withGPSSamsng (length =14)
  result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43)
 
  so result 2 matches your request.
 
  In this way, you can avoid case-sensitive, word-order-rearrange load of
  works. Furthermore, you can do refined work, such as remove white
  characters, etc.
 
  Scott @ Taiwan
 
 
  - Original Message - From: Varun Gupta
 varun.vgu...@gmail.com
 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, October 26, 2010 9:07 PM
 
  Subject: How do I this in Solr?
 
 
   Hi,
 
  I have lot of small documents (each containing 1 to 15 words) indexed
 in
  Solr. For the search query, I want the search results to contain only
  those
  documents that satisfy this criteria All of the words of the search
  result
  document are present in the search query
 
  For example:
  If I have the following documents indexed: nokia n95, GPS,
 android,
  samsung, samsung andriod, nokia andriod, mobile with GPS
 
  If I search with the text samsung andriod GPS, search results should
  only
  conain samsung, GPS, andriod and samsung andriod.
 
  Is there a way to do this in Solr.
 
  --
  Thanks
  Varun Gupta
 
 
 
 
  
 
 
 
 
  %b6G$J0T.'$$'d(l/f,r!C
  Checked by AVG - www.avg.com
  Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
  14:34:00
 
 


Re: How do I this in Solr?

2010-10-27 Thread Lance Norskog
There is also a feature called a 'filter'. If you use certain words a 
lot, you can make filter queries with just those words.  Look for 
'filter' and 'fq=' on the wiki.


But really you can have hundreds of words in a query and not have a 
performance problem. Solr/Lucene is very fast. In benchmarking I have 
trouble sending enough requests to make several processors run at the 
same time.


Varun Gupta wrote:

Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only those
documents that satisfy this criteria All of the words of the search result
document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS, android,
samsung, samsung andriod, nokia andriod, mobile with GPS

If I search with the text samsung andriod GPS, search results should only
conain samsung, GPS, andriod and samsung andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta

   


RE: How do I this in Solr?

2010-10-27 Thread Michael Sokolov
You might try adding a field containing the word count and making sure that
matches the query's word count?

This would require you to tokenize the query and document yourself, perhaps.

-Mike 

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com] 
 Sent: Tuesday, October 26, 2010 11:26 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?
 
 Thanks everybody for the inputs.
 
 Looks like Steven's solution is the closest one but will lead 
 to performance issues when the query string has many terms.
 
 I will try to implement the two filters suggested by Steven 
 and see how the performance matches up.
 
 --
 Thanks
 Varun Gupta
 
 
 On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???) 
 scott@udngroup.comwrote:
 
  I think you have to write a yet exact match handler 
 yourself (I mean 
  yet cause it's not quite exact match we normally know). 
 Steve's answer 
  is quite near your request. You can do further work based 
 on his solution.
 
  At the last step, I'll suggest you eat up all blank within query 
  string and query result, respevtively  only returns those results 
  that has equal string length as the query string's.
 
  For example, giving:
  *query string = Samsung with GPS
  *query results:
  resutl 1 = Samsung has lots of mobile with GPS
  result 2 = with GPS Samsng
  result 3 = GPS mobile with vendors, such as Sony, Samsung
 
  they become:
  *query result = SamsungwithGPS (length =14) *query results:
  resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 = 
  withGPSSamsng (length =14) result 3 = 
  GPSmobilewithvendors,suchasSony,Samsung (length =43)
 
  so result 2 matches your request.
 
  In this way, you can avoid case-sensitive, 
 word-order-rearrange load 
  of works. Furthermore, you can do refined work, such as 
 remove white 
  characters, etc.
 
  Scott @ Taiwan
 
 
  - Original Message - From: Varun Gupta 
  varun.vgu...@gmail.com
 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, October 26, 2010 9:07 PM
 
  Subject: How do I this in Solr?
 
 
   Hi,
 
  I have lot of small documents (each containing 1 to 15 
 words) indexed 
  in Solr. For the search query, I want the search results 
 to contain 
  only those documents that satisfy this criteria All of 
 the words of 
  the search result document are present in the search query
 
  For example:
  If I have the following documents indexed: nokia n95, GPS, 
  android, samsung, samsung andriod, nokia andriod, 
 mobile with GPS
 
  If I search with the text samsung andriod GPS, search results 
  should only conain samsung, GPS, andriod and 
 samsung andriod.
 
  Is there a way to do this in Solr.
 
  --
  Thanks
  Varun Gupta
 
 
 
 
  
 --
  --
 
 
 
  %b6G$J0T.'$$'d(l/f,r!C
  Checked by AVG - www.avg.com
  Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 
  10/26/10 14:34:00
 
 
 



RE: How do I this in Solr?

2010-10-27 Thread Steven A Rowe
I'm pretty sure the word-count strategy won't work.

 If I search with the text samsung andriod GPS, search results
 should only conain samsung, GPS, andriod and samsung andriod.

Using the word-count strategy, a document containing samsung andriod PDQ 
would be a hit, but Varun doesn't want it, because it contains a word that is 
not in the query.

Steve

 -Original Message-
 From: Michael Sokolov [mailto:soko...@ifactory.com]
 Sent: Wednesday, October 27, 2010 7:44 AM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 You might try adding a field containing the word count and making sure
 that
 matches the query's word count?
 
 This would require you to tokenize the query and document yourself,
 perhaps.
 
 -Mike
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 11:26 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in Solr?
 
  Thanks everybody for the inputs.
 
  Looks like Steven's solution is the closest one but will lead
  to performance issues when the query string has many terms.
 
  I will try to implement the two filters suggested by Steven
  and see how the performance matches up.
 
  --
  Thanks
  Varun Gupta
 
 
  On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
  scott@udngroup.comwrote:
 
   I think you have to write a yet exact match handler
  yourself (I mean
   yet cause it's not quite exact match we normally know).
  Steve's answer
   is quite near your request. You can do further work based
  on his solution.
  
   At the last step, I'll suggest you eat up all blank within query
   string and query result, respevtively  only returns those results
   that has equal string length as the query string's.
  
   For example, giving:
   *query string = Samsung with GPS
   *query results:
   resutl 1 = Samsung has lots of mobile with GPS
   result 2 = with GPS Samsng
   result 3 = GPS mobile with vendors, such as Sony, Samsung
  
   they become:
   *query result = SamsungwithGPS (length =14) *query results:
   resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 =
   withGPSSamsng (length =14) result 3 =
   GPSmobilewithvendors,suchasSony,Samsung (length =43)
  
   so result 2 matches your request.
  
   In this way, you can avoid case-sensitive,
  word-order-rearrange load
   of works. Furthermore, you can do refined work, such as
  remove white
   characters, etc.
  
   Scott @ Taiwan
  
  
   - Original Message - From: Varun Gupta
   varun.vgu...@gmail.com
  
   To: solr-user@lucene.apache.org
   Sent: Tuesday, October 26, 2010 9:07 PM
  
   Subject: How do I this in Solr?
  
  
Hi,
  
   I have lot of small documents (each containing 1 to 15
  words) indexed
   in Solr. For the search query, I want the search results
  to contain
   only those documents that satisfy this criteria All of
  the words of
   the search result document are present in the search query
  
   For example:
   If I have the following documents indexed: nokia n95, GPS,
   android, samsung, samsung andriod, nokia andriod,
  mobile with GPS
  
   If I search with the text samsung andriod GPS, search results
   should only conain samsung, GPS, andriod and
  samsung andriod.
  
   Is there a way to do this in Solr.
  
   --
   Thanks
   Varun Gupta
  
  
  
  
  
  --
   --
  
  
  
   %b6G$J0T.'$$'d(l/f,r!C
   Checked by AVG - www.avg.com
   Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
   10/26/10 14:34:00
  
  
 



Re: How do I this in Solr?

2010-10-27 Thread Mike Sokolov
Right - my point was to combine this with the previous approaches to 
form a query like:


samsung AND android AND GPS AND word_count:3

in order to exclude documents containing additional words. This would 
avoid the combinatoric explosion problem otehrs had alluded to earlier. 
Of course this would fail because android is mis- spelled :)


-Mike

On 10/27/2010 08:45 AM, Steven A Rowe wrote:

I'm pretty sure the word-count strategy won't work.

   

If I search with the text samsung andriod GPS, search results
should only conain samsung, GPS, andriod and samsung andriod.
 

Using the word-count strategy, a document containing samsung andriod PDQ 
would be a hit, but Varun doesn't want it, because it contains a word that is not in the 
query.

Steve

   

-Original Message-
From: Michael Sokolov [mailto:soko...@ifactory.com]
Sent: Wednesday, October 27, 2010 7:44 AM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

You might try adding a field containing the word count and making sure
that
matches the query's word count?

This would require you to tokenize the query and document yourself,
perhaps.

-Mike

 

-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 11:26 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?

Thanks everybody for the inputs.

Looks like Steven's solution is the closest one but will lead
to performance issues when the query string has many terms.

I will try to implement the two filters suggested by Steven
and see how the performance matches up.

--
Thanks
Varun Gupta


On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
scott@udngroup.comwrote:

   

I think you have to write a yet exact match handler
 

yourself (I mean
   

yet cause it's not quite exact match we normally know).
 

Steve's answer
   

is quite near your request. You can do further work based
 

on his solution.
   

At the last step, I'll suggest you eat up all blank within query
string and query result, respevtively  only returns those results
that has equal string length as the query string's.

For example, giving:
*query string = Samsung with GPS
*query results:
resutl 1 = Samsung has lots of mobile with GPS
result 2 = with GPS Samsng
result 3 = GPS mobile with vendors, such as Sony, Samsung

they become:
*query result = SamsungwithGPS (length =14) *query results:
resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 =
withGPSSamsng (length =14) result 3 =
GPSmobilewithvendors,suchasSony,Samsung (length =43)

so result 2 matches your request.

In this way, you can avoid case-sensitive,
 

word-order-rearrange load
   

of works. Furthermore, you can do refined work, such as
 

remove white
   

characters, etc.

Scott @ Taiwan


- Original Message - From: Varun Gupta
varun.vgu...@gmail.com

To:solr-user@lucene.apache.org
Sent: Tuesday, October 26, 2010 9:07 PM

Subject: How do I this in Solr?


  Hi,
 

I have lot of small documents (each containing 1 to 15
   

words) indexed
   

in Solr. For the search query, I want the search results
   

to contain
   

only those documents that satisfy this criteria All of
   

the words of
   

the search result document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS,
android, samsung, samsung andriod, nokia andriod,
   

mobile with GPS
   

If I search with the text samsung andriod GPS, search results
should only conain samsung, GPS, andriod and
   

samsung andriod.
   

Is there a way to do this in Solr.

--
Thanks
Varun Gupta


   



 

--
   

--



%b6G$J0T.'$$'d(l/f,r!C
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
10/26/10 14:34:00


 
   
   


Re: How do I this in Solr?

2010-10-27 Thread Toke Eskildsen
That does not work either as it requires that all the terms in the query
are present in the document. The original poster did not state this
requirement. On the contrary, his examples were mostly single-word
matches, implying an OR-search at the core.

The query-explosion still seems like the only working idea. Maybe Varun
could comment on the maximum numbers of terms that his queries will
contain?

Regards,
Toke Eskildsen

On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
 Right - my point was to combine this with the previous approaches to 
 form a query like:
 
 samsung AND android AND GPS AND word_count:3
 
 in order to exclude documents containing additional words. This would 
 avoid the combinatoric explosion problem otehrs had alluded to earlier. 
 Of course this would fail because android is mis- spelled :)
 
 -Mike
 
 On 10/27/2010 08:45 AM, Steven A Rowe wrote:
  I'm pretty sure the word-count strategy won't work.
 
 
  If I search with the text samsung andriod GPS, search results
  should only conain samsung, GPS, andriod and samsung andriod.
   
  Using the word-count strategy, a document containing samsung andriod PDQ 
  would be a hit, but Varun doesn't want it, because it contains a word that 
  is not in the query.
 
  Steve
 
 
  -Original Message-
  From: Michael Sokolov [mailto:soko...@ifactory.com]
  Sent: Wednesday, October 27, 2010 7:44 AM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
 
  You might try adding a field containing the word count and making sure
  that
  matches the query's word count?
 
  This would require you to tokenize the query and document yourself,
  perhaps.
 
  -Mike
 
   
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 11:26 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in Solr?
 
  Thanks everybody for the inputs.
 
  Looks like Steven's solution is the closest one but will lead
  to performance issues when the query string has many terms.
 
  I will try to implement the two filters suggested by Steven
  and see how the performance matches up.
 
  --
  Thanks
  Varun Gupta
 
 
  On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
  scott@udngroup.comwrote:
 
 
  I think you have to write a yet exact match handler
   
  yourself (I mean
 
  yet cause it's not quite exact match we normally know).
   
  Steve's answer
 
  is quite near your request. You can do further work based
   
  on his solution.
 
  At the last step, I'll suggest you eat up all blank within query
  string and query result, respevtively  only returns those results
  that has equal string length as the query string's.
 
  For example, giving:
  *query string = Samsung with GPS
  *query results:
  resutl 1 = Samsung has lots of mobile with GPS
  result 2 = with GPS Samsng
  result 3 = GPS mobile with vendors, such as Sony, Samsung
 
  they become:
  *query result = SamsungwithGPS (length =14) *query results:
  resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 =
  withGPSSamsng (length =14) result 3 =
  GPSmobilewithvendors,suchasSony,Samsung (length =43)
 
  so result 2 matches your request.
 
  In this way, you can avoid case-sensitive,
   
  word-order-rearrange load
 
  of works. Furthermore, you can do refined work, such as
   
  remove white
 
  characters, etc.
 
  Scott @ Taiwan
 
 
  - Original Message - From: Varun Gupta
  varun.vgu...@gmail.com
 
  To:solr-user@lucene.apache.org
  Sent: Tuesday, October 26, 2010 9:07 PM
 
  Subject: How do I this in Solr?
 
 
Hi,
   
  I have lot of small documents (each containing 1 to 15
 
  words) indexed
 
  in Solr. For the search query, I want the search results
 
  to contain
 
  only those documents that satisfy this criteria All of
 
  the words of
 
  the search result document are present in the search query
 
  For example:
  If I have the following documents indexed: nokia n95, GPS,
  android, samsung, samsung andriod, nokia andriod,
 
  mobile with GPS
 
  If I search with the text samsung andriod GPS, search results
  should only conain samsung, GPS, andriod and
 
  samsung andriod.
 
  Is there a way to do this in Solr.
 
  --
  Thanks
  Varun Gupta
 
 
 
 
 
   
  --
 
  --
 
 
 
  %b6G$J0T.'$$'d(l/f,r!C
  Checked by AVG - www.avg.com
  Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
  10/26/10 14:34:00
 
 
   
 
 




Re: How do I this in Solr?

2010-10-27 Thread Mike Sokolov
Yes I missed that requirement (as Steven also pointed out in a private 
e-mail).  I now agree that the combinatorics are required.


Another possibility to consider (if the queries are large, which 
actually seems unlikely) is to use the default behavior where all terms 
are optional, sort by relevance, and truncate the result list on the 
client side after some unwanted term is found.  I *think* the scoring 
should find only docs with the searched-for terms first, although if 
there are a lot of repeated terms maybe not? Also result counts will be 
screwy.


-Mike

On 10/27/2010 09:34 AM, Toke Eskildsen wrote:

That does not work either as it requires that all the terms in the query
are present in the document. The original poster did not state this
requirement. On the contrary, his examples were mostly single-word
matches, implying an OR-search at the core.

The query-explosion still seems like the only working idea. Maybe Varun
could comment on the maximum numbers of terms that his queries will
contain?

Regards,
Toke Eskildsen

On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
   

Right - my point was to combine this with the previous approaches to
form a query like:

samsung AND android AND GPS AND word_count:3

in order to exclude documents containing additional words. This would
avoid the combinatoric explosion problem otehrs had alluded to earlier.
Of course this would fail because android is mis- spelled :)

-Mike

On 10/27/2010 08:45 AM, Steven A Rowe wrote:
 

I'm pretty sure the word-count strategy won't work.


   

If I search with the text samsung andriod GPS, search results
should only conain samsung, GPS, andriod and samsung andriod.

 

Using the word-count strategy, a document containing samsung andriod PDQ 
would be a hit, but Varun doesn't want it, because it contains a word that is not in the 
query.

Steve


   

-Original Message-
From: Michael Sokolov [mailto:soko...@ifactory.com]
Sent: Wednesday, October 27, 2010 7:44 AM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

You might try adding a field containing the word count and making sure
that
matches the query's word count?

This would require you to tokenize the query and document yourself,
perhaps.

-Mike


 

-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 11:26 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?

Thanks everybody for the inputs.

Looks like Steven's solution is the closest one but will lead
to performance issues when the query string has many terms.

I will try to implement the two filters suggested by Steven
and see how the performance matches up.

--
Thanks
Varun Gupta


On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
scott@udngroup.comwrote:


   

I think you have to write a yet exact match handler

 

yourself (I mean

   

yet cause it's not quite exact match we normally know).

 

Steve's answer

   

is quite near your request. You can do further work based

 

on his solution.

   

At the last step, I'll suggest you eat up all blank within query
string and query result, respevtively   only returns those results
that has equal string length as the query string's.

For example, giving:
*query string = Samsung with GPS
*query results:
resutl 1 = Samsung has lots of mobile with GPS
result 2 = with GPS Samsng
result 3 = GPS mobile with vendors, such as Sony, Samsung

they become:
*query result = SamsungwithGPS (length =14) *query results:
resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 =
withGPSSamsng (length =14) result 3 =
GPSmobilewithvendors,suchasSony,Samsung (length =43)

so result 2 matches your request.

In this way, you can avoid case-sensitive,

 

word-order-rearrange load

   

of works. Furthermore, you can do refined work, such as

 

remove white

   

characters, etc.

Scott @ Taiwan


- Original Message - From: Varun Gupta
varun.vgu...@gmail.com

To:solr-user@lucene.apache.org
Sent: Tuesday, October 26, 2010 9:07 PM

Subject: How do I this in Solr?


   Hi,

 

I have lot of small documents (each containing 1 to 15

   

words) indexed

   

in Solr. For the search query, I want the search results

   

to contain

   

only those documents that satisfy this criteria All of

   

the words of

   

the search result document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS,
android, samsung, samsung andriod, nokia andriod,

   

mobile with GPS

   

If I search with the text samsung andriod GPS, search results
should only conain samsung, GPS, andriod and

   

samsung andriod.

   

Is there a way to do this in Solr.

--
Thanks
Varun Gupta

Re: How do I this in Solr?

2010-10-27 Thread Varun Gupta
Toke, the search query will contain 4-5 words on an average (excluding the
stopwords).

Mike, I don't care about the result count. Excluding the terms at the client
side may be a good idea. Is there any way to alter scoring such that the
docs containing only the searched-for terms are shown first? Can I use term
frequency to do such kind of thing?

--
Thanks
Varun Gupta

On Wed, Oct 27, 2010 at 7:13 PM, Mike Sokolov soko...@ifactory.com wrote:

 Yes I missed that requirement (as Steven also pointed out in a private
 e-mail).  I now agree that the combinatorics are required.

 Another possibility to consider (if the queries are large, which actually
 seems unlikely) is to use the default behavior where all terms are optional,
 sort by relevance, and truncate the result list on the client side after
 some unwanted term is found.  I *think* the scoring should find only docs
 with the searched-for terms first, although if there are a lot of repeated
 terms maybe not? Also result counts will be screwy.

 -Mike


 On 10/27/2010 09:34 AM, Toke Eskildsen wrote:

 That does not work either as it requires that all the terms in the query
 are present in the document. The original poster did not state this
 requirement. On the contrary, his examples were mostly single-word
 matches, implying an OR-search at the core.

 The query-explosion still seems like the only working idea. Maybe Varun
 could comment on the maximum numbers of terms that his queries will
 contain?

 Regards,
 Toke Eskildsen

 On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:


 Right - my point was to combine this with the previous approaches to
 form a query like:

 samsung AND android AND GPS AND word_count:3

 in order to exclude documents containing additional words. This would
 avoid the combinatoric explosion problem otehrs had alluded to earlier.
 Of course this would fail because android is mis- spelled :)

 -Mike

 On 10/27/2010 08:45 AM, Steven A Rowe wrote:


 I'm pretty sure the word-count strategy won't work.




 If I search with the text samsung andriod GPS, search results
 should only conain samsung, GPS, andriod and samsung andriod.



 Using the word-count strategy, a document containing samsung andriod
 PDQ would be a hit, but Varun doesn't want it, because it contains a word
 that is not in the query.

 Steve




 -Original Message-
 From: Michael Sokolov [mailto:soko...@ifactory.com]
 Sent: Wednesday, October 27, 2010 7:44 AM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?

 You might try adding a field containing the word count and making sure
 that
 matches the query's word count?

 This would require you to tokenize the query and document yourself,
 perhaps.

 -Mike




 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010 11:26 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?

 Thanks everybody for the inputs.

 Looks like Steven's solution is the closest one but will lead
 to performance issues when the query string has many terms.

 I will try to implement the two filters suggested by Steven
 and see how the performance matches up.

 --
 Thanks
 Varun Gupta


 On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
 scott@udngroup.comwrote:




 I think you have to write a yet exact match handler



 yourself (I mean



 yet cause it's not quite exact match we normally know).



 Steve's answer



 is quite near your request. You can do further work based



 on his solution.



 At the last step, I'll suggest you eat up all blank within query
 string and query result, respevtively   only returns those results
 that has equal string length as the query string's.

 For example, giving:
 *query string = Samsung with GPS
 *query results:
 resutl 1 = Samsung has lots of mobile with GPS
 result 2 = with GPS Samsng
 result 3 = GPS mobile with vendors, such as Sony, Samsung

 they become:
 *query result = SamsungwithGPS (length =14) *query results:
 resutl 1 = SamsunghaslotsofmobilewithGPS (length =29) result 2 =
 withGPSSamsng (length =14) result 3 =
 GPSmobilewithvendors,suchasSony,Samsung (length =43)

 so result 2 matches your request.

 In this way, you can avoid case-sensitive,



 word-order-rearrange load



 of works. Furthermore, you can do refined work, such as



 remove white



 characters, etc.

 Scott @ Taiwan


 - Original Message - From: Varun Gupta
 varun.vgu...@gmail.com

 To:solr-user@lucene.apache.org
 Sent: Tuesday, October 26, 2010 9:07 PM

 Subject: How do I this in Solr?


   Hi,



 I have lot of small documents (each containing 1 to 15



 words) indexed



 in Solr. For the search query, I want the search results



 to contain



 only those documents that satisfy this criteria All of



 the words of



 the search result document are present in the search query

 For example:
 If I have the following documents indexed: nokia n95, GPS,
 android, samsung, samsung andriod, nokia

How do I this in Solr?

2010-10-26 Thread Varun Gupta
Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only those
documents that satisfy this criteria All of the words of the search result
document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS, android,
samsung, samsung andriod, nokia andriod, mobile with GPS

If I search with the text samsung andriod GPS, search results should only
conain samsung, GPS, andriod and samsung andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta


Re: How do I this in Solr?

2010-10-26 Thread Savvas-Andreas Moysidis
If I get your question right, you probably want to use the AND binary
operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS

On 26 October 2010 14:07, Varun Gupta varun.vgu...@gmail.com wrote:

 Hi,

 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only those
 documents that satisfy this criteria All of the words of the search result
 document are present in the search query

 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS

 If I search with the text samsung andriod GPS, search results should only
 conain samsung, GPS, andriod and samsung andriod.

 Is there a way to do this in Solr.

 --
 Thanks
 Varun Gupta



RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Varun,

I can't think of a way to do it without writing new analysis filters.

But I think you could do what you want with two filters (this is untested):

1. An index-time filter that outputs a single token consisting of all of the 
input tokens, sorted in a consistent way, e.g.:

   mobile with GPS - GPS mobile with
   samsung android - android samsung

2. A query-time filter that outputs one token per input term combination, 
sorted in the same consistent way as the index-time filter, e.g.:

   samsung andriod GPS
 - samsung,android,GPS,
android samsung,GPS samsung,android GPS
android GPS samsung

Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in Solr?
 
 Hi,
 
 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only
 those
 documents that satisfy this criteria All of the words of the search
 result
 document are present in the search query
 
 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS
 
 If I search with the text samsung andriod GPS, search results should
 only
 conain samsung, GPS, andriod and samsung andriod.
 
 Is there a way to do this in Solr.
 
 --
 Thanks
 Varun Gupta


Re: How do I this in Solr?

2010-10-26 Thread Ken Stanley
On Tue, Oct 26, 2010 at 9:15 AM, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

 If I get your question right, you probably want to use the AND binary
 operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS


N.b. For these queries you can also pass the q.op parameter in the request
to temporarily change the default operator to AND; this has the same effect
without having to build the query; i.e., you can just pass
http://host:port/solr/select?q=samsung+android+gpsq.op=and;
as the query string (along with any other params you need).


RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Overkill?

Dennis Gearon
 
 I can't think of a way to do it without writing new
 analysis filters.
 
 But I think you could do what you want with two filters
 (this is untested):
 
 1. An index-time filter that outputs a single token
 consisting of all of the input tokens, sorted in a
 consistent way, e.g.:
 
    mobile with GPS - GPS mobile
 with
    samsung android - android
 samsung
 
 2. A query-time filter that outputs one token per input
 term combination, sorted in the same consistent way as the
 index-time filter, e.g.:
 
    samsung andriod GPS
  -   
 samsung,android,GPS,
         android
 samsung,GPS samsung,android GPS
         android GPS
 samsung
 
 Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in Solr?
  
  Hi,
  
  I have lot of small documents (each containing 1 to 15
 words) indexed in
  Solr. For the search query, I want the search results
 to contain only
  those
  documents that satisfy this criteria All of the words
 of the search
  result
  document are present in the search query
  
  For example:
  If I have the following documents indexed: nokia
 n95, GPS, android,
  samsung, samsung andriod, nokia andriod, mobile
 with GPS
  
  If I search with the text samsung andriod GPS,
 search results should
  only
  conain samsung, GPS, andriod and samsung
 andriod.
  
  Is there a way to do this in Solr.
  
  --
  Thanks
  Varun Gupta



Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall

Um.. you could change your default clause to AND rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis Gearon wrote:

Overkill?

Dennis Gearon

I can't think of a way to do it without writing new
analysis filters.

But I think you could do what you want with two filters
(this is untested):

1. An index-time filter that outputs a single token
consisting of all of the input tokens, sorted in a
consistent way, e.g.:

mobile with GPS -  GPS mobile
with
samsung android -  android
samsung

2. A query-time filter that outputs one token per input
term combination, sorted in the same consistent way as the
index-time filter, e.g.:

samsung andriod GPS
  -
samsung,android,GPS,

 android
samsung,GPS samsung,android GPS
 android GPS
samsung

Steve


-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 9:08 AM
To: solr-user@lucene.apache.org
Subject: How do I this in Solr?

Hi,

I have lot of small documents (each containing 1 to 15

words) indexed in

Solr. For the search query, I want the search results

to contain only

those
documents that satisfy this criteria All of the words

of the search

result
document are present in the search query

For example:
If I have the following documents indexed: nokia

n95, GPS, android,

samsung, samsung andriod, nokia andriod, mobile

with GPS

If I search with the text samsung andriod GPS,

search results should

only
conain samsung, GPS, andriod and samsung

andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta




RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Um, maybe I'm way off base, but when Varun said:

 If I search with the text samsung andriod GPS,
 search results should only conain samsung, GPS,
 andriod and samsung andriod.

I interpreted that to mean that hit documents should contain terms from the 
query, and nothing else.  Making all terms required doesn't do this.

Steve

 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 2:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?
 
 Um.. you could change your default clause to AND rather than or.
 
 That should do the trick.
 
 Matt
 
 On 10/26/2010 2:26 PM, Dennis Gearon wrote:
  Overkill?
 
  Dennis Gearon
  I can't think of a way to do it without writing new
  analysis filters.
 
  But I think you could do what you want with two filters
  (this is untested):
 
  1. An index-time filter that outputs a single token
  consisting of all of the input tokens, sorted in a
  consistent way, e.g.:
 
  mobile with GPS -  GPS mobile
  with
  samsung android -  android
  samsung
 
  2. A query-time filter that outputs one token per input
  term combination, sorted in the same consistent way as the
  index-time filter, e.g.:
 
  samsung andriod GPS
-
  samsung,android,GPS,
   android
  samsung,GPS samsung,android GPS
   android GPS
  samsung
 
  Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in Solr?
 
  Hi,
 
  I have lot of small documents (each containing 1 to 15
  words) indexed in
  Solr. For the search query, I want the search results
  to contain only
  those
  documents that satisfy this criteria All of the words
  of the search
  result
  document are present in the search query
 
  For example:
  If I have the following documents indexed: nokia
  n95, GPS, android,
  samsung, samsung andriod, nokia andriod, mobile
  with GPS
  If I search with the text samsung andriod GPS,
  search results should
  only
  conain samsung, GPS, andriod and samsung
  andriod.
  Is there a way to do this in Solr.
 
  --
  Thanks
  Varun Gupta



RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Good point. Since I might need such a query myself someday, how *IS* that done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 11:46 AM
 Um, maybe I'm way off base, but when
 Varun said:
 
  If I search with the text samsung andriod GPS,
  search results should only conain samsung, GPS,
  andriod and samsung andriod.
 
 I interpreted that to mean that hit documents should
 contain terms from the query, and nothing else.  Making
 all terms required doesn't do this.
 
 Steve
 
  -Original Message-
  From: Matthew Hall [mailto:mh...@informatics.jax.org]
  Sent: Tuesday, October 26, 2010 2:30 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in Solr?
  
  Um.. you could change your default clause to AND
 rather than or.
  
  That should do the trick.
  
  Matt
  
  On 10/26/2010 2:26 PM, Dennis Gearon wrote:
   Overkill?
  
   Dennis Gearon
   I can't think of a way to do it without
 writing new
   analysis filters.
  
   But I think you could do what you want with
 two filters
   (this is untested):
  
   1. An index-time filter that outputs a single
 token
   consisting of all of the input tokens, sorted
 in a
   consistent way, e.g.:
  
       mobile with GPS
 -  GPS mobile
   with
       samsung android
 -  android
   samsung
  
   2. A query-time filter that outputs one token
 per input
   term combination, sorted in the same
 consistent way as the
   index-time filter, e.g.:
  
       samsung andriod
 GPS
         -
   samsung,android,GPS,
            android
   samsung,GPS samsung,android GPS
            android
 GPS
   samsung
  
   Steve
  
   -Original Message-
   From: Varun Gupta [mailto:varun.vgu...@gmail.com]
   Sent: Tuesday, October 26, 2010 9:08 AM
   To: solr-user@lucene.apache.org
   Subject: How do I this in Solr?
  
   Hi,
  
   I have lot of small documents (each
 containing 1 to 15
   words) indexed in
   Solr. For the search query, I want the
 search results
   to contain only
   those
   documents that satisfy this criteria All
 of the words
   of the search
   result
   document are present in the search
 query
  
   For example:
   If I have the following documents
 indexed: nokia
   n95, GPS, android,
   samsung, samsung andriod, nokia
 andriod, mobile
   with GPS
   If I search with the text samsung
 andriod GPS,
   search results should
   only
   conain samsung, GPS, andriod and
 samsung
   andriod.
   Is there a way to do this in Solr.
  
   --
   Thanks
   Varun Gupta
 



RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Dennis,

Do you mean to say that you read my earlier post, and disagree that it would 
solve the problem?  Or have you simply not read it?

Steve

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:00 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 Good point. Since I might need such a query myself someday, how *IS* that
 done?
 
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 11:46 AM
  Um, maybe I'm way off base, but when
  Varun said:
 
   If I search with the text samsung andriod GPS,
   search results should only conain samsung, GPS,
   andriod and samsung andriod.
 
  I interpreted that to mean that hit documents should
  contain terms from the query, and nothing else.  Making
  all terms required doesn't do this.
 
  Steve
 
   -Original Message-
   From: Matthew Hall [mailto:mh...@informatics.jax.org]
   Sent: Tuesday, October 26, 2010 2:30 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How do I this in Solr?
  
   Um.. you could change your default clause to AND
  rather than or.
  
   That should do the trick.
  
   Matt
  
   On 10/26/2010 2:26 PM, Dennis Gearon wrote:
Overkill?
   
Dennis Gearon
I can't think of a way to do it without
  writing new
analysis filters.
   
But I think you could do what you want with
  two filters
(this is untested):
   
1. An index-time filter that outputs a single
  token
consisting of all of the input tokens, sorted
  in a
consistent way, e.g.:
   
        mobile with GPS
  -  GPS mobile
with
        samsung android
  -  android
samsung
   
2. A query-time filter that outputs one token
  per input
term combination, sorted in the same
  consistent way as the
index-time filter, e.g.:
   
        samsung andriod
  GPS
          -
samsung,android,GPS,
             android
samsung,GPS samsung,android GPS
             android
  GPS
samsung
   
Steve
   
-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 9:08 AM
To: solr-user@lucene.apache.org
Subject: How do I this in Solr?
   
Hi,
   
I have lot of small documents (each
  containing 1 to 15
words) indexed in
Solr. For the search query, I want the
  search results
to contain only
those
documents that satisfy this criteria All
  of the words
of the search
result
document are present in the search
  query
   
For example:
If I have the following documents
  indexed: nokia
n95, GPS, android,
samsung, samsung andriod, nokia
  andriod, mobile
with GPS
If I search with the text samsung
  andriod GPS,
search results should
only
conain samsung, GPS, andriod and
  samsung
andriod.
Is there a way to do this in Solr.
   
--
Thanks
Varun Gupta
 
 


RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
If Solr is like Google, once documents matching only the ANDed items in the 
query ran out, then those that had only two of the terms, then only 1 of the 
terms, and then those close to it would start showing up.

Is this correct?

If so, it wouldn't match his requirements.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:10 PM
 Dennis,
 
 Do you mean to say that you read my earlier post, and
 disagree that it would solve the problem?  Or have you
 simply not read it?
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:00 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  Good point. Since I might need such a query myself
 someday, how *IS* that
  done?
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 11:46 AM
   Um, maybe I'm way off base, but when
   Varun said:
  
If I search with the text samsung andriod
 GPS,
search results should only conain samsung,
 GPS,
andriod and samsung andriod.
  
   I interpreted that to mean that hit documents
 should
   contain terms from the query, and nothing else. 
 Making
   all terms required doesn't do this.
  
   Steve
  
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010 2:30 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?
   
Um.. you could change your default clause to
 AND
   rather than or.
   
That should do the trick.
   
Matt
   
On 10/26/2010 2:26 PM, Dennis Gearon wrote:
 Overkill?

 Dennis Gearon
 I can't think of a way to do it
 without
   writing new
 analysis filters.

 But I think you could do what you
 want with
   two filters
 (this is untested):

 1. An index-time filter that
 outputs a single
   token
 consisting of all of the input
 tokens, sorted
   in a
 consistent way, e.g.:

     mobile with GPS
   -  GPS mobile
 with
     samsung android
   -  android
 samsung

 2. A query-time filter that outputs
 one token
   per input
 term combination, sorted in the
 same
   consistent way as the
 index-time filter, e.g.:

     samsung andriod
   GPS
       -
 samsung,android,GPS,
          android
 samsung,GPS samsung,android
 GPS
          android
   GPS
 samsung

 Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010
 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in
 Solr?

 Hi,

 I have lot of small documents
 (each
   containing 1 to 15
 words) indexed in
 Solr. For the search query, I
 want the
   search results
 to contain only
 those
 documents that satisfy this
 criteria All
   of the words
 of the search
 result
 document are present in the
 search
   query

 For example:
 If I have the following
 documents
   indexed: nokia
 n95, GPS, android,
 samsung, samsung andriod,
 nokia
   andriod, mobile
 with GPS
 If I search with the text
 samsung
   andriod GPS,
 search results should
 only
 conain samsung, GPS,
 andriod and
   samsung
 andriod.
 Is there a way to do this in
 Solr.

 --
 Thanks
 Varun Gupta
  
  



RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Plus, if he wants terms that contain ONLY those words, and no others, an ANDed 
query would not do that, right? ANDed queries return results that must have ALL 
the terms listed, and could have lots of other words, right?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:10 PM
 Dennis,
 
 Do you mean to say that you read my earlier post, and
 disagree that it would solve the problem?  Or have you
 simply not read it?
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:00 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  Good point. Since I might need such a query myself
 someday, how *IS* that
  done?
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 11:46 AM
   Um, maybe I'm way off base, but when
   Varun said:
  
If I search with the text samsung andriod
 GPS,
search results should only conain samsung,
 GPS,
andriod and samsung andriod.
  
   I interpreted that to mean that hit documents
 should
   contain terms from the query, and nothing else. 
 Making
   all terms required doesn't do this.
  
   Steve
  
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010 2:30 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?
   
Um.. you could change your default clause to
 AND
   rather than or.
   
That should do the trick.
   
Matt
   
On 10/26/2010 2:26 PM, Dennis Gearon wrote:
 Overkill?

 Dennis Gearon
 I can't think of a way to do it
 without
   writing new
 analysis filters.

 But I think you could do what you
 want with
   two filters
 (this is untested):

 1. An index-time filter that
 outputs a single
   token
 consisting of all of the input
 tokens, sorted
   in a
 consistent way, e.g.:

     mobile with GPS
   -  GPS mobile
 with
     samsung android
   -  android
 samsung

 2. A query-time filter that outputs
 one token
   per input
 term combination, sorted in the
 same
   consistent way as the
 index-time filter, e.g.:

     samsung andriod
   GPS
       -
 samsung,android,GPS,
          android
 samsung,GPS samsung,android
 GPS
          android
   GPS
 samsung

 Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010
 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in
 Solr?

 Hi,

 I have lot of small documents
 (each
   containing 1 to 15
 words) indexed in
 Solr. For the search query, I
 want the
   search results
 to contain only
 those
 documents that satisfy this
 criteria All
   of the words
 of the search
 result
 document are present in the
 search
   query

 For example:
 If I have the following
 documents
   indexed: nokia
 n95, GPS, android,
 samsung, samsung andriod,
 nokia
   andriod, mobile
 with GPS
 If I search with the text
 samsung
   andriod GPS,
 search results should
 only
 conain samsung, GPS,
 andriod and
   samsung
 andriod.
 Is there a way to do this in
 Solr.

 --
 Thanks
 Varun Gupta
  
  



RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Dennis,

You wrote:
 If Solr is like Google, once documents matching only the ANDed items
 in the query ran out, then those that had only two of the terms, then
 only 1 of the terms, and then those close to it would start showing up.
[...]
 Plus, if he wants terms that contain ONLY those words, and no others, an
 ANDed query would not do that, right? ANDed queries return results that
 must have ALL the terms listed, and could have lots of other words, right?

This is *exactly* what I just said: ANDed queries (i.e., requiring all query 
terms) will not satisfy Varun's requirements.

Your participation in this thread looks an awful lot like flame-bating: Someone 
else asks a question, I answer with a possible solution, you give a one-word 
overkill response, I say why it's not overkill.  You then ask if anybody 
knows the answer to the original question, and then parrot my response to your 
overkill statement.  Really

Get your shit together or shut up.  Please.

Steve

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:14 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:10 PM
  Dennis,
 
  Do you mean to say that you read my earlier post, and
  disagree that it would solve the problem?  Or have you
  simply not read it?
 
  Steve
 
   -Original Message-
   From: Dennis Gearon [mailto:gear...@sbcglobal.net]
   Sent: Tuesday, October 26, 2010 3:00 PM
   To: solr-user@lucene.apache.org
   Subject: RE: How do I this in Solr?
  
   Good point. Since I might need such a query myself
  someday, how *IS* that
   done?
  
  
   Dennis Gearon
  
   Signature Warning
   
   It is always a good idea to learn from your own
  mistakes. It is usually a
   better idea to learn from others’ mistakes, so you
  do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  
   EARTH has a Right To Life,
     otherwise we all die.
  
  
   --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
  wrote:
  
From: Steven A Rowe sar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 11:46 AM
Um, maybe I'm way off base, but when
Varun said:
   
 If I search with the text samsung andriod
  GPS,
 search results should only conain samsung,
  GPS,
 andriod and samsung andriod.
   
I interpreted that to mean that hit documents
  should
contain terms from the query, and nothing else.
  Making
all terms required doesn't do this.
   
Steve
   
 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 2:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?

 Um.. you could change your default clause to
  AND
rather than or.

 That should do the trick.

 Matt

 On 10/26/2010 2:26 PM, Dennis Gearon wrote:
  Overkill?
 
  Dennis Gearon
  I can't think of a way to do it
  without
writing new
  analysis filters.
 
  But I think you could do what you
  want with
two filters
  (this is untested):
 
  1. An index-time filter that
  outputs a single
token
  consisting of all of the input
  tokens, sorted
in a
  consistent way, e.g.:
 
      mobile with GPS
-  GPS mobile
  with
      samsung android
-  android
  samsung
 
  2. A query-time filter that outputs
  one token
per input
  term combination, sorted in the
  same
consistent way as the
  index-time filter, e.g.:
 
      samsung andriod
GPS
        -
  samsung,android,GPS,
           android
  samsung,GPS samsung,android
  GPS
           android
GPS
  samsung
 
  Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010
  9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in
  Solr?
 
  Hi,
 
  I have lot of small documents
  (each
containing 1 to 15
  words) indexed in
  Solr. For the search query, I
  want the
search results
  to contain only
  those
  documents

RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:27 PM
 Hi Dennis,
 
 You wrote:
  If Solr is like Google, once documents matching only
 the ANDed items
  in the query ran out, then those that had only two of
 the terms, then
  only 1 of the terms, and then those close to it would
 start showing up.
 [...]
  Plus, if he wants terms that contain ONLY those words,
 and no others, an
  ANDed query would not do that, right? ANDed queries
 return results that
  must have ALL the terms listed, and could have lots of
 other words, right?
 
 This is *exactly* what I just said: ANDed queries (i.e.,
 requiring all query terms) will not satisfy Varun's
 requirements.
 
 Your participation in this thread looks an awful lot like
 flame-bating: Someone else asks a question, I answer with a
 possible solution, you give a one-word overkill response,
 I say why it's not overkill.  You then ask if anybody
 knows the answer to the original question, and then parrot
 my response to your overkill statement.  Really
 
 Get your shit together or shut up.  Please.
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:14 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 12:10 PM
   Dennis,
  
   Do you mean to say that you read my earlier post,
 and
   disagree that it would solve the problem?  Or
 have you
   simply not read it?
  
   Steve
  
-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?
   
Good point. Since I might need such a query
 myself
   someday, how *IS* that
done?
   
   
Dennis Gearon
   
Signature Warning

It is always a good idea to learn from your
 own
   mistakes. It is usually a
better idea to learn from others’
 mistakes, so you
   do not have to make
them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
   
EARTH has a Right To Life,
      otherwise we all die.
   
   
--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
   wrote:
   
 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org
   solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 11:46
 AM
 Um, maybe I'm way off base, but when
 Varun said:

  If I search with the text samsung
 andriod
   GPS,
  search results should only conain
 samsung,
   GPS,
  andriod and samsung andriod.

 I interpreted that to mean that hit
 documents
   should
 contain terms from the query, and
 nothing else.
   Making
 all terms required doesn't do this.

 Steve

  -Original Message-
  From: Matthew Hall [mailto:mh...@informatics.jax.org]
  Sent: Tuesday, October 26, 2010
 2:30 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in
 Solr?
 
  Um.. you could change your default
 clause to
   AND
 rather than or.
 
  That should do the trick.
 
  Matt
 
  On 10/26/2010 2:26 PM, Dennis
 Gearon wrote:
   Overkill?
  
   Dennis Gearon
   I can't think of a way to
 do it
   without
 writing new
   analysis filters.
  
   But I think you could do
 what you
   want with
 two filters
   (this is untested):
  
   1. An index-time filter
 that
   outputs a single
 token
   consisting of all of the
 input
   tokens, sorted

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Dennis,

I wasn't trying to force your admission of my rectitude - I was just getting 
frustrated that the conversation was moving in spiral fashion, and was worried 
that you might have intentionally engineered that.

I'm glad to hear that you weren't flame baiting.

Steve


 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:35 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 I'm the LAST person anyone will ever need to worry about flame baiting.
 You did notice that I retracted what I said and supported your point of
 view?
 
 Sorry if my cryptic comment sounded critical. I was wrong, you were right
 :-)
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:27 PM
  Hi Dennis,
 
  You wrote:
   If Solr is like Google, once documents matching only
  the ANDed items
   in the query ran out, then those that had only two of
  the terms, then
   only 1 of the terms, and then those close to it would
  start showing up.
  [...]
   Plus, if he wants terms that contain ONLY those words,
  and no others, an
   ANDed query would not do that, right? ANDed queries
  return results that
   must have ALL the terms listed, and could have lots of
  other words, right?
 
  This is *exactly* what I just said: ANDed queries (i.e.,
  requiring all query terms) will not satisfy Varun's
  requirements.
 
  Your participation in this thread looks an awful lot like
  flame-bating: Someone else asks a question, I answer with a
  possible solution, you give a one-word overkill response,
  I say why it's not overkill.  You then ask if anybody
  knows the answer to the original question, and then parrot
  my response to your overkill statement.  Really
 
  Get your shit together or shut up.  Please.
 
  Steve
 
   -Original Message-
   From: Dennis Gearon [mailto:gear...@sbcglobal.net]
   Sent: Tuesday, October 26, 2010 3:14 PM
   To: solr-user@lucene.apache.org
   Subject: RE: How do I this in Solr?
  
  
  
   Dennis Gearon
  
   Signature Warning
   
   It is always a good idea to learn from your own
  mistakes. It is usually a
   better idea to learn from others’ mistakes, so you
  do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  
   EARTH has a Right To Life,
     otherwise we all die.
  
  
   --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
  wrote:
  
From: Steven A Rowe sar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,
   
Do you mean to say that you read my earlier post,
  and
disagree that it would solve the problem?  Or
  have you
simply not read it?
   
Steve
   
 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:00 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?

 Good point. Since I might need such a query
  myself
someday, how *IS* that
 done?


 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your
  own
mistakes. It is usually a
 better idea to learn from others’
  mistakes, so you
do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  

 EARTH has a Right To Life,
   otherwise we all die.


 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
wrote:

  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org
solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 11:46
  AM
  Um, maybe I'm way off base, but when
  Varun said:
 
   If I search with the text samsung
  andriod
GPS,
   search results should only conain
  samsung,
GPS,
   andriod and samsung andriod.
 
  I interpreted that to mean that hit
  documents
should
  contain terms from the query, and
  nothing else.
Making
  all terms required doesn't do this.
 
  Steve
 
   -Original Message-
   From: Matthew Hall [mailto:mh...@informatics.jax.org]
   Sent: Tuesday, October 26, 2010
  2:30 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How

Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall
Indeed, I'd missed the second part of his requirements, my and solution 
is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates all 
of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:Nokia AND text:Mobile AND text:GPS AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:

I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:


From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word overkill response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your overkill statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text samsung

andriod

GPS,

search results should only conain

samsung,

GPS,

andriod and samsung andriod.

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To: solr-user@lucene.apache.org
Subject: Re: How do I this in

Solr?

Um.. you could change your default

clause to

AND

rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis

Gearon wrote

Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall
Bah.. nope this would miss documents that only match a subset of the 
given terms.


I'm going to have to go with Steven's approach as the right choice here.

Matt

On 10/26/2010 3:44 PM, Matthew Hall wrote:
Indeed, I'd missed the second part of his requirements, my and 
solution is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates 
all of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:Nokia AND text:Mobile AND text:GPS AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:
I'm the LAST person anyone will ever need to worry about flame 
baiting. You did notice that I retracted what I said and supported 
your point of view?


Sorry if my cryptic comment sounded critical. I was wrong, you were 
right :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is 
usually a better idea to learn from others’ mistakes, so you do not 
have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:


From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word overkill response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your overkill statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text samsung

andriod

GPS,

search results should only conain

samsung,

GPS,

andriod and samsung andriod.

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To: solr

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Matt,

I think your concern about performance is spot-on, though.

The combinatorial explosion would be at query time, not at index time - my 
solution has a single token indexed per document. My suggested query-time 
filter would generate the following number of output terms, where C(n,k) is the 
combination of n things taken k at a time, n is the number of input query 
terms, and k is the number of concatenated input query terms forming one output 
query term:

C(n,1)+C(n,2)...+C(n,n-1)+C(n,n)

For small queries this would not be a problem:

1 input query term - 1 output query term
2 input query terms - 3 output query terms
3 input query terms - 7 output query terms
4 input query terms - 15 output query terms

But for larger queries, it could be fairly expensive:

10 input query terms - 1,023 output query terms
...
15 input query terms - 32,767 output query terms

This is exactly (2^n - 1) output query terms, where n is the number of input 
terms.

32k query terms might be too slow to be functional.

Steve

 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 3:51 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?
 
 Bah.. nope this would miss documents that only match a subset of the
 given terms.
 
 I'm going to have to go with Steven's approach as the right choice here.
 
 Matt
 
 On 10/26/2010 3:44 PM, Matthew Hall wrote:
  Indeed, I'd missed the second part of his requirements, my and
  solution is sadly insufficient to this task.
 
  The combinatorial part of you solution worries me a bit though Steven,
  because his documents that are on the larger side of his corpus would
  likely slow down query performance a bit while the filter calculates
  all of the possibilities for a given document.
 
  I'm wondering if a slightly hybrid approach would be valid:
 
  Have a filter that calculates the total number of terms for a given
  document.  And then add a clause into your query at runtime that would
  match what the filter would come up with:
 
  So:
 
  text:Nokia AND text:Mobile AND text:GPS AND termCount: 3
 
  Something like that anyhow.
 
  Matt
 
  On 10/26/2010 3:35 PM, Dennis Gearon wrote:
  I'm the LAST person anyone will ever need to worry about flame
  baiting. You did notice that I retracted what I said and supported
  your point of view?
 
  Sorry if my cryptic comment sounded critical. I was wrong, you were
  right :-)
  Dennis Gearon
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is
  usually a better idea to learn from others’ mistakes, so you do not
  have to make them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  EARTH has a Right To Life,
 otherwise we all die.
 
 
  --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:
 
  From: Steven A Rowesar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:27 PM
  Hi Dennis,
 
  You wrote:
  If Solr is like Google, once documents matching only
  the ANDed items
  in the query ran out, then those that had only two of
  the terms, then
  only 1 of the terms, and then those close to it would
  start showing up.
  [...]
  Plus, if he wants terms that contain ONLY those words,
  and no others, an
  ANDed query would not do that, right? ANDed queries
  return results that
  must have ALL the terms listed, and could have lots of
  other words, right?
 
  This is *exactly* what I just said: ANDed queries (i.e.,
  requiring all query terms) will not satisfy Varun's
  requirements.
 
  Your participation in this thread looks an awful lot like
  flame-bating: Someone else asks a question, I answer with a
  possible solution, you give a one-word overkill response,
  I say why it's not overkill.  You then ask if anybody
  knows the answer to the original question, and then parrot
  my response to your overkill statement.  Really
 
  Get your shit together or shut up.  Please.
 
  Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:14 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
 
 
 
  Dennis Gearon
 
  Signature Warning
  
  It is always a good idea to learn from your own
  mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
  do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
  EARTH has a Right To Life,
  otherwise we all die.
 
 
  --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu
  wrote:
  From: Steven A Rowesar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:10 PM
  Dennis,
 
  Do you mean to say

Re: How do I this in Solr?

2010-10-26 Thread 朱炎詹
I think you have to write a yet exact match handler yourself (I mean yet 
cause it's not quite exact match we normally know). Steve's answer is quite 
near your request. You can do further work based on his solution.


At the last step, I'll suggest you eat up all blank within query string and 
query result, respevtively  only returns those results that has equal 
string length as the query string's.


For example, giving:
*query string = Samsung with GPS
*query results:
resutl 1 = Samsung has lots of mobile with GPS
result 2 = with GPS Samsng
result 3 = GPS mobile with vendors, such as Sony, Samsung

they become:
*query result = SamsungwithGPS (length =14)
*query results:
resutl 1 = SamsunghaslotsofmobilewithGPS (length =29)
result 2 = withGPSSamsng (length =14)
result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43)

so result 2 matches your request.

In this way, you can avoid case-sensitive, word-order-rearrange load of 
works. Furthermore, you can do refined work, such as remove white 
characters, etc.


Scott @ Taiwan


- Original Message - 
From: Varun Gupta varun.vgu...@gmail.com

To: solr-user@lucene.apache.org
Sent: Tuesday, October 26, 2010 9:07 PM
Subject: How do I this in Solr?



Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only 
those
documents that satisfy this criteria All of the words of the search 
result

document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS, android,
samsung, samsung andriod, nokia andriod, mobile with GPS

If I search with the text samsung andriod GPS, search results should 
only

conain samsung, GPS, andriod and samsung andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta








%b6G$J0T.'$$'d(l/f,r!C
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 
14:34:00




Re: How do I this in Solr?

2010-10-26 Thread Varun Gupta
Thanks everybody for the inputs.

Looks like Steven's solution is the closest one but will lead to performance
issues when the query string has many terms.

I will try to implement the two filters suggested by Steven and see how the
performance matches up.

--
Thanks
Varun Gupta


On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) scott@udngroup.comwrote:

 I think you have to write a yet exact match handler yourself (I mean yet
 cause it's not quite exact match we normally know). Steve's answer is quite
 near your request. You can do further work based on his solution.

 At the last step, I'll suggest you eat up all blank within query string and
 query result, respevtively  only returns those results that has equal
 string length as the query string's.

 For example, giving:
 *query string = Samsung with GPS
 *query results:
 resutl 1 = Samsung has lots of mobile with GPS
 result 2 = with GPS Samsng
 result 3 = GPS mobile with vendors, such as Sony, Samsung

 they become:
 *query result = SamsungwithGPS (length =14)
 *query results:
 resutl 1 = SamsunghaslotsofmobilewithGPS (length =29)
 result 2 = withGPSSamsng (length =14)
 result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43)

 so result 2 matches your request.

 In this way, you can avoid case-sensitive, word-order-rearrange load of
 works. Furthermore, you can do refined work, such as remove white
 characters, etc.

 Scott @ Taiwan


 - Original Message - From: Varun Gupta varun.vgu...@gmail.com

 To: solr-user@lucene.apache.org
 Sent: Tuesday, October 26, 2010 9:07 PM

 Subject: How do I this in Solr?


  Hi,

 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only
 those
 documents that satisfy this criteria All of the words of the search
 result
 document are present in the search query

 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS

 If I search with the text samsung andriod GPS, search results should
 only
 conain samsung, GPS, andriod and samsung andriod.

 Is there a way to do this in Solr.

 --
 Thanks
 Varun Gupta




 



 %b6G$J0T.'$$'d(l/f,r!C
 Checked by AVG - www.avg.com
 Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
 14:34:00




Re: How do I get the solr error response as XML instead of HTML

2010-10-13 Thread Chris Hostetter
: solr errors come back as HTML instead of XM or JSON
: 
: Is it possible to get the response to come back as XML or JSON, or at
: least something I could show to an end user?

At the moment, Solr just relies on the Servlet Container to generate the 
error response, so you'd have to customize it at that level to get it 
formatted in XML or JSON.

There is an open issue to make Solr generate the error responses directly 
so the ResponseWriters could format them (SOLR-141) but there hasn't been 
a lot of demand for it.

-Hoss


How do I get the solr error response as XML instead of HTML

2010-10-07 Thread Scott K
solr errors come back as HTML instead of XM or JSON

Is it possible to get the response to come back as XML or JSON, or at
least something I could show to an end user?

Is there a way to tell solr to ignore unparseable terms and still
return a result, ideally with a warning so the end user doesn't get an
error page.


GET 'http://localhost:8983/solr/select/?q=term+ORwt=xml'
html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 400 org.apache.lucene.queryParser.ParseException: Cannot
parse 'term OR': Encountered lt;EOFgt; at line 1, column 7.
Was expecting one of:
lt;NOTgt; ...
+ ...
- ...
( ...
* ...
lt;QUOTEDgt; ...
lt;TERMgt; ...
lt;PREFIXTERMgt; ...
lt;WILDTERMgt; ...
lt;REGEXPTERMgt; ...
[ ...
{ ...
lt;NUMBERgt; ...
lt;TERMgt; ...
* ...
/title
/head
bodyh2HTTP ERROR 400/h2
pProblem accessing /solr/select/. Reason:
preorg.apache.lucene.queryParser.ParseException: Cannot parse
'term OR': Encountered lt;EOFgt; at line 1, column 7.
Was expecting one of:
lt;NOTgt; ...
+ ...
- ...
( ...
* ...
lt;QUOTEDgt; ...
lt;TERMgt; ...
lt;PREFIXTERMgt; ...
lt;WILDTERMgt; ...
lt;REGEXPTERMgt; ...
[ ...
{ ...
lt;NUMBERgt; ...
lt;TERMgt; ...
* ...
/pre/phr /ismallPowered by Jetty:///small/ibr/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/

/body
/html


Re: How do I get the solr error response as XML instead of HTML

2010-10-07 Thread Otis Gospodnetic
Scott,

Regarding unparseable terms - I think even edismaxc query parser is more 
forgiving that the standard one, but if that is not the case, one can always 
build a custom query parser that is more forgiving regarding invalid query 
string syntax.

Re HTML response - I'm guessing you are seeing something that looks like HTML 
to 
you in a browser.  It should be XML.  If it is not, please show us what you are 
seeing.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Scott K s...@skister.com
 To: solr-user@lucene.apache.org
 Sent: Thu, October 7, 2010 8:07:35 PM
 Subject: How do I get the solr error response as XML instead of HTML
 
 solr errors come back as HTML instead of XM or JSON
 
 Is it possible to get  the response to come back as XML or JSON, or at
 least something I could show  to an end user?
 
 Is there a way to tell solr to ignore unparseable terms  and still
 return a result, ideally with a warning so the end user doesn't get  an
 error page.
 
 
 GET 'http://localhost:8983/solr/select/?q=term+ORwt=xml'
 html
 head
 meta  http-equiv=Content-Type content=text/html;  charset=ISO-8859-1/
 titleError 400  org.apache.lucene.queryParser.ParseException: Cannot
 parse 'term OR':  Encountered lt;EOFgt; at line 1, column 7.
 Was expecting one  of:
 lt;NOTgt; ...
 + ...
  - ...
 ( ...
 * ...
  lt;QUOTEDgt; ...
 lt;TERMgt;  ...
 lt;PREFIXTERMgt; ...
  lt;WILDTERMgt; ...
 lt;REGEXPTERMgt;  ...
 [ ...
 { ...
  lt;NUMBERgt; ...
 lt;TERMgt; ...
  * ...
  /title
 /head
 bodyh2HTTP ERROR  400/h2
 pProblem accessing /solr/select/.  Reason:
 pre org.apache.lucene.queryParser.ParseException: Cannot parse
 'term OR':  Encountered lt;EOFgt; at line 1, column 7.
 Was expecting one  of:
 lt;NOTgt; ...
 + ...
  - ...
 ( ...
 * ...
  lt;QUOTEDgt; ...
 lt;TERMgt;  ...
 lt;PREFIXTERMgt; ...
  lt;WILDTERMgt; ...
 lt;REGEXPTERMgt;  ...
 [ ...
 { ...
  lt;NUMBERgt; ...
 lt;TERMgt; ...
  * ...
 /pre/phr  /ismallPowered by  Jetty:///small/ibr/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 
 /body
 /html
 


Re: How do I create a solr core with the data from an existing one?

2010-03-25 Thread Chris Hostetter

: *Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
: of the core, and then swapping it in for the main core. I tried following
...
: The problem I am having is, the core created in step 1 doesn't have any data
: in it. If I am going to do a full index of everything and the kitchen sink,
: that would be fine, but if I just want to update a (large) subset of the
: documents - that's obviously not going to work.

that's really the point of that recommendation -- it's a way to compleltey 
rebuild without any downtime (the old core keeps serving requests until 
the new one is completely ready)

If you are just updating some of the docs (even if it's a large some) 
you should just updating hte existing core.

if you really want to clone the data in a core, then replication is 
really the only way to do that currently.  Replicating to a query 
machine instead of having clients query the master you are updating 
directly is usually a good idea for lots of reasons -- but in this case 
you could always temporarily disable replication, make your large batch 
changes to the master, and then renable the replciation so the query boxes 
only see the changes when they are all done.


-Hoss



How do I create a solr core with the data from an existing one?

2010-03-24 Thread Steve Dupree
*Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
of the core, and then swapping it in for the main core. I tried following
these steps:

   1. Create prep core:
   
http://localhost:8983/solr/admin/cores?action=CREATEname=prepinstanceDir=main
   2. Perform index update, then commit/optimize on prep core.
   3. Swap main and prep core:
   http://localhost:8983/solr/admin/cores?action=SWAPcore=mainother=prep
   4. Unload prep core:
   http://localhost:8983/solr/admin/cores?action=UNLOADcore=prep

The problem I am having is, the core created in step 1 doesn't have any data
in it. If I am going to do a full index of everything and the kitchen sink,
that would be fine, but if I just want to update a (large) subset of the
documents - that's obviously not going to work.

(I could merge the cores, but part of what I'm trying to do is get rid of
any deleted documents without trying to make a list of them.)

Is there some flag to the CREATE action that I'm missing? The Solr Wiki page
for CoreAdmin http://wiki.apache.org/solr/CoreAdmin is a little sparse on
details.

Is this approach wrong? I found at least one message on this list that
stated that performing updates in a separate core on the same machine won't
help, given that they're both using the same CPU. Is that true?
thanks in advance
~stannius


Re: How do I create a solr core with the data from an existing one?

2010-03-24 Thread gwk

Hi,

I'm not sure if it's the best option but you could use replication to 
copy the index (http://wiki.apache.org/solr/SolrReplication). As long as 
you core is configured as a master you can use the fetchindex command to 
do a one-time replication from the new core (see the HTTP API section in 
the wiki page).


Regards,

gwk


On 3/24/2010 5:31 PM, Steve Dupree wrote:

*Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
of the core, and then swapping it in for the main core. I tried following
these steps:

1. Create prep core:

http://localhost:8983/solr/admin/cores?action=CREATEname=prepinstanceDir=main
2. Perform index update, then commit/optimize on prep core.
3. Swap main and prep core:
http://localhost:8983/solr/admin/cores?action=SWAPcore=mainother=prep
4. Unload prep core:
http://localhost:8983/solr/admin/cores?action=UNLOADcore=prep

The problem I am having is, the core created in step 1 doesn't have any data
in it. If I am going to do a full index of everything and the kitchen sink,
that would be fine, but if I just want to update a (large) subset of the
documents - that's obviously not going to work.

(I could merge the cores, but part of what I'm trying to do is get rid of
any deleted documents without trying to make a list of them.)

Is there some flag to the CREATE action that I'm missing? The Solr Wiki page
for CoreAdminhttp://wiki.apache.org/solr/CoreAdmin  is a little sparse on
details.

Is this approach wrong? I found at least one message on this list that
stated that performing updates in a separate core on the same machine won't
help, given that they're both using the same CPU. Is that true?
thanks in advance
~stannius

   


Question: How do I run the solr analysis tool programtically ?

2009-09-03 Thread Yatir

Form java code I want to contact solr through Http and supply a text buffer
(or a url that returns text, whatever is easier) and I want to get in return
the final list of tokens (or the final text buffer) after it went through
all the query time filters defined for this solr instance (stemming, stop
words etc)
thanks in advance

-- 
View this message in context: 
http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question: How do I run the solr analysis tool programtically ?

2009-09-03 Thread Chris Male
Hi Yatir,

The FieldAnalysisRequestHandler has the same behavior as the analysis tool.
It will show you the list of tokens that are created after each of the
filters have been applied.  It can be used through normal HTTP requests, or
you can use SolrJ's support.

Thanks,
Chris

On Thu, Sep 3, 2009 at 12:42 PM, Yatir yat...@outbrain.com wrote:


 Form java code I want to contact solr through Http and supply a text buffer
 (or a url that returns text, whatever is easier) and I want to get in
 return
 the final list of tokens (or the final text buffer) after it went through
 all the query time filters defined for this solr instance (stemming, stop
 words etc)
 thanks in advance

 --
 View this message in context:
 http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html
 Sent from the Solr - User mailing list archive at Nabble.com.