Keeping qt parameter in distributed search

2010-10-26 Thread Shawn Heisey
I have a request handler with a qt of lbcheck so that load balancer 
healthchecks, which happen every five seconds, do not skew my query 
statistics.


I've recently modified the way i do my load balancing, which required 
that I add a shards parameter to my PingRequestHandler.  The ping 
handler includes the qt parameter set to lbcheck, but when it is 
distributed to the shards, this gets lost, and it uses the standard 
handler.  Now my broker core is the only one with correct statistics.


Is there any way to preserve qt in a distributed search so this doesn't 
happen?  I am using Solr 1.4.1, but we are upgrading to 3.1-dev very soon.


Thanks,
Shawn



Need help for solr searching case insensative item

2010-10-26 Thread wu liu
Hi all,

I just noticed a wierd thing happend to my solr search result.
if I do a search for ecommons, it cannot get the result for eCommons, 
instead,
if i do a search for eCommons, i can only get all the match for eCommons, 
but not ecommons.

I cannot figure it out why?

please help me

Thanks very much in advance


Externalizing properties file

2010-10-26 Thread sivaprasad

Hi,
I created custom component in solr.This is using one properties file.When i
place the jar in solr_home  lib directory the class is coming into class
path, but the properties file is not.If i bundle the properties file in side
jar , the file is coming into class path.But i need to externalize the
properties file.I am using ResourceBundle.getBundle to load the properties
file.Where do i need to place the properties file?Can anybody has the idea?

Regards,
JS
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Externalizing-properties-file-tp1768972p1768972.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Modelling Access Control

2010-10-26 Thread Lance Norskog
Filter queries are a set of bits which is ANDed against query results
at a very early stage of query processing. They are very useful.  Note
that they are stored (I think) in parsed query order, so you have to
pass in the same filter query string each time.

On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon gear...@sbcglobal.net wrote:
 Thanks for that insight, a lot.

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
 better idea to learn from others’ mistakes, so you do not have to make them 
 yourself. from 
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Monday, October 25, 2010, 8:19 AM
 Dennis Gearon wrote:
  why use filter queries?
 
  Wouldn't reducing the set headed into the filters by
 putting it in the main query be faster? (A question to
 learn, since I do NOT know :-)
 
 
 No. At least as I understand it. In the best case, the
 filter query will be a lot faster, because filter queries
 are cached seperately in the filter cache.  So if the
 existing filter query can be found in the cache, it'll be a
 lot faster. If it's not in the cache, the performance should
 be pretty much the same as if you had included it as an
 additional clause in the main q query.

 The reasons to put it in a fq filter are:

 1) The caching behavior. You can have that certain part of
 the query be cached on it's own, speeding up any subsequent
 queries that use that same fq.

 2) Simplification of client code. You can leave your 'q'
 however you want it, using whatever kind of query parser you
 want too (dismax, whatever), and just add on the 'fq'
 without touching the 'q'.   This is a lot
 easier to do, and especially when you're using it for access
 control like this, a lot harder for a bug to creep in.

 Jonathan







-- 
Lance Norskog
goks...@gmail.com


Re: Modelling Access Control

2010-10-26 Thread Lance Norskog
The idea of ACL-based queries is: each document carries all of the
groups or roles that it is ok with. Each user search includes all of
the groups or roles the user has.

The roles are stored as multivalued string fields. Each ACL-based
query passes in roles:A OR roles:B OR roles:C and if any of A,B,C
are in the stored ACL field, you have a match.

This is called early binding. Late binding is when you return
everything and the app calls LDAP and say can she see this? or
this?. This is slow and puts a monster load on the ACL server.

Very important: do not make a spelling or autosuggest index from a
text field which some people can see and other people can't.

On Tue, Oct 26, 2010 at 12:06 AM, Lance Norskog goks...@gmail.com wrote:
 Filter queries are a set of bits which is ANDed against query results
 at a very early stage of query processing. They are very useful.  Note
 that they are stored (I think) in parsed query order, so you have to
 pass in the same filter query string each time.

 On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon gear...@sbcglobal.net wrote:
 Thanks for that insight, a lot.

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
 better idea to learn from others’ mistakes, so you do not have to make them 
 yourself. from 
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Monday, October 25, 2010, 8:19 AM
 Dennis Gearon wrote:
  why use filter queries?
 
  Wouldn't reducing the set headed into the filters by
 putting it in the main query be faster? (A question to
 learn, since I do NOT know :-)
 
 
 No. At least as I understand it. In the best case, the
 filter query will be a lot faster, because filter queries
 are cached seperately in the filter cache.  So if the
 existing filter query can be found in the cache, it'll be a
 lot faster. If it's not in the cache, the performance should
 be pretty much the same as if you had included it as an
 additional clause in the main q query.

 The reasons to put it in a fq filter are:

 1) The caching behavior. You can have that certain part of
 the query be cached on it's own, speeding up any subsequent
 queries that use that same fq.

 2) Simplification of client code. You can leave your 'q'
 however you want it, using whatever kind of query parser you
 want too (dismax, whatever), and just add on the 'fq'
 without touching the 'q'.   This is a lot
 easier to do, and especially when you're using it for access
 control like this, a lot harder for a bug to creep in.

 Jonathan







 --
 Lance Norskog
 goks...@gmail.com




-- 
Lance Norskog
goks...@gmail.com


Re: DIH wiht several Cores

2010-10-26 Thread stockiii

okay. how did you solve this ? 
do you wrote an own importer ? 

we have a own importer yet, but only for one instance of solr and one
index, we want to spit this in severeal cores and indexes and want to use
DIH because we think his indexing is better than a php skript ...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-wiht-several-Cores-tp1767883p1772223.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help for solr searching case insensative item

2010-10-26 Thread yandong yao
Sounds like WordDelimiterFilter config issue, please refer to
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
.

Also it will help if you could provide:
1) Tokenizers/Filters config in schema file
2) analysis.jsp output in admin page.

2010/10/26 wu liu wul...@mail.usask.ca

 Hi all,

 I just noticed a wierd thing happend to my solr search result.
 if I do a search for ecommons, it cannot get the result for eCommons,
 instead,
 if i do a search for eCommons, i can only get all the match for
 eCommons, but not ecommons.

 I cannot figure it out why?

 please help me

 Thanks very much in advance



Re: Need help for solr searching case insensative item

2010-10-26 Thread Jan Høydahl / Cominvent
Hi,

You need to share relevant parts of your schema for us to be able to see what's 
going on.

Try using fieldType=text. Basically, you need a fieldType which has the 
lowercaseFilter included.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 25. okt. 2010, at 21.09, wu liu wrote:

 Hi all,
 
 I just noticed a wierd thing happend to my solr search result.
 if I do a search for ecommons, it cannot get the result for eCommons, 
 instead,
 if i do a search for eCommons, i can only get all the match for eCommons, 
 but not ecommons.
 
 I cannot figure it out why?
 
 please help me
 
 Thanks very much in advance



Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira
Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The
query result is correct. But when i see it in my index, the value stored is
something unusual bunch of characters e.g. *...@6628ad5a*

Please suggest as to what went wrong.

- Pawan


On Mon, Oct 25, 2010 at 6:44 PM, Ephraim Ofir ephra...@icq.com wrote:

 Assuming you're talking about data that comes from a DB, I find it easiest
 to do this kind of logic on the DB's side (mssql example):
 SELECT IF(someField = someValue, desiredValue, NULL) AS desiredName from
 someTable

 If that's not possible, you can use RegexTransformer(
 http://wiki.apache.org/solr/DataImportHandler#RegexTransformer) or (worst
 case and worst performance) ScriptTransformer(
 http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer) and
 actually write a JS script to do your logic.

 Ephraim Ofir

 -Original Message-
 From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com]
 Sent: Monday, October 25, 2010 10:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to index on basis of a condition?

 Do you want to use a field's content do decide whether the document should
 be indexed or not?
 You could write an UpdateProcessor for that, simply aborting the chain for
 the docs that don't pass your test.

 @Override
 public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();
String value = (String) doc.getFieldValue(myfield);
String condition = foobar;
if(value == condition) {
super.processAdd(cmd);
}
 }

 But if what you meant was to skip only that field if it does not match
 condition, you could use doc.removeField(name) instead. Now you can feed
 your content using whatever method you like.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 25. okt. 2010, at 08.38, Pawan Darira wrote:

  Hi
 
  I want to index a particular field on one if() condition. Can i do it
  through DIH?
 
  Please suggest.
 
  --
  Thanks,
  Pawan Darira




-- 
Thanks,
Pawan Darira


Re: How to index on basis of a condition?

2010-10-26 Thread Gora Mohanty
On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira pawan.dar...@gmail.com wrote:
 Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The
 query result is correct. But when i see it in my index, the value stored is
 something unusual bunch of characters e.g. *...@6628ad5a*
[...]

Which database are you indexing from? The field type is probably
a blob in the database. Check that, and look into the ClobTransformer:
http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

Regards,
Gora


Does Solr reload schema.xml dynamically?

2010-10-26 Thread Swapnonil Mukherjee
Hi Everybody,

If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
can apply the changes to schema.xml without restarting Solr?

Swapnonil Mukherjee





Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread David Stuart
If you are using Solr Multicore http://wiki.apache.org/solr/CoreAdmin you can 
issue a Reload command 
http://localhost:8983/solr/admin/cores?action=RELOADcore=core0

On 26 Oct 2010, at 11:09, Swapnonil Mukherjee wrote:

 Hi Everybody,
 
 If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
 can apply the changes to schema.xml without restarting Solr?
 
 Swapnonil Mukherjee
 
 
 



Re: command line to check if Solr is up running

2010-10-26 Thread Peter Karich

 Hi Xin,

from the wiki:
http://wiki.apache.org/solr/SolrConfigXml

The URL of the ping query is* /admin/ping

* You can also check (via wget) the number of documents. it might look 
like a rusty hack but it works for me:


wget -T 1 -q http://localhost:8080/solr/select?q=*:*; -O - |  tr '/' 
'\n' | grep numFound | tr '' ' ' | awk '{print $5}'`


Regards,
Peter.


As we know we can use browser to check if Solr is running by going to 
http://$hostName:$portNumber/$masterName/admin, say http://localhost:8080/solr1/admin. My questions 
is: are there any ways to check it using command line? I used curl 
http://localhost:8080; to check my Tomcat, it worked fine. However, no response if I try 
curl http://localhost:8080/solr1/admin; (even when my Solr is running). Does anyone know 
any command line alternatives?

Thanks,
Xin
This electronic mail message contains information that (a) is or
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE
PROTECTED
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of
the addressee(s) named herein.  If you are not an intended
recipient, please contact the sender immediately and take the
steps necessary to delete the message completely from your
computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the
Uniform Electronic Transaction Act or any other law of similar
effect, absent an express statement to the contrary, this e-mail
message, its contents, and any attachments hereto are not
intended
to represent an offer or acceptance to enter into a contract and
are not otherwise intended to bind this sender,
barnesandnoble.com
llc, barnesandnoble.com inc. or any other person or entity.



--
http://jetwick.com twitter search prototype



Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira
I am using mysql database, and, field type is date

On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The
  query result is correct. But when i see it in my index, the value stored
 is
  something unusual bunch of characters e.g. *...@6628ad5a*
 [...]

 Which database are you indexing from? The field type is probably
 a blob in the database. Check that, and look into the ClobTransformer:
 http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

 Regards,
 Gora




-- 
Thanks,
Pawan Darira


Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Peter Karich

 Hi,

See this:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

Solr will also load the new configuration (without restart the webapp) 
on the slaves when using replication:

http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.


Hi Everybody,

If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
can apply the changes to schema.xml without restarting Solr?

Swapnonil Mukherjee







--
http://jetwick.com twitter search prototype



RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir
This is probably just a date format problem, nothing to do with the IF()
statement.  Try applying this on your date:
DATE_FORMAT(yourDate, '%Y-%m-%dT00:00:00Z')

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

I am using mysql database, and, field type is date

On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty g...@mimirtech.com
wrote:

 On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement.
The
  query result is correct. But when i see it in my index, the value
stored
 is
  something unusual bunch of characters e.g. *...@6628ad5a*
 [...]

 Which database are you indexing from? The field type is probably
 a blob in the database. Check that, and look into the ClobTransformer:
 http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

 Regards,
 Gora




-- 
Thanks,
Pawan Darira


Highlighting for non-stored fields

2010-10-26 Thread Phong Dais
Hi,

I've been looking thru the mailing archive for the past week and I haven't
found any useful info regarding this issue.

My requirement is to index a few terabytes worth of data to be searched.
Due to the size of the data, I would like to index without storing but I
would like to use the highlighting feature.  Is this even possible?  What
are my options?

I've read about termOffsets, payload that could possibly be used to do this
but I have no idea how this could be done.

Any pointers greatly appreciated.  Someone please point me in the right
direction.

 I don't mind having to write some code or digging thru existing code to
accomplish this task.

Thanks,
P.


RE: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Ephraim Ofir
Note that usually when you change the schema.xml you have not only to
restart solr, but also rebuild the index, so the issue of how to reload
the file seems like a small problem...

Ephraim Ofir

-Original Message-
From: Peter Karich [mailto:peat...@yahoo.de] 
Sent: Tuesday, October 26, 2010 12:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Does Solr reload schema.xml dynamically?

  Hi,

See this:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

Solr will also load the new configuration (without restart the webapp) 
on the slaves when using replication:
http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.

 Hi Everybody,

 If I change my schema.xml to, do I have to restart Solr. Is there some
way, I can apply the changes to schema.xml without restarting Solr?

 Swapnonil Mukherjee






-- 
http://jetwick.com twitter search prototype



Re: How to index on basis of a condition?

2010-10-26 Thread Gora Mohanty
On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira pawan.dar...@gmail.com wrote:
 I am using mysql database, and, field type is date
[...]

Could you show us the exact SELECT statement, and some example
values returned by running the SELECT directly at a mysql console?

Regards,
Gora


Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira
My Sql is

select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from
tcuser.ad_details where SOME_CONDiTION

+---+
| ad_sort_field |
+---+
| 2010-05-30|
| 2010-05-02|
| 2010-10-07|
| NULL|
| 2010-10-15|
| NULL|
++

Thanks
Pawan


On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  I am using mysql database, and, field type is date
 [...]

 Could you show us the exact SELECT statement, and some example
 values returned by running the SELECT directly at a mysql console?

 Regards,
 Gora




-- 
Thanks,
Pawan Darira


Query only a specfic field with a specific value using Dismax Handler

2010-10-26 Thread Swapnonil Mukherjee
Hi Everybody,

Let me give you a brief idea of our Solr document. We have about 6 text type 
fields, each containing IPTC data extracted from photos. Search is performed 
mostly on these 6 fields.
We also have a mutlivalue field named group_id that contains a list of all the  
group_ids that have access to this photo.  In other words we are storing the 
metadata of the photo as well as the permissions applicable for this photo in 
the Solr document itself. This group_id field by the way is of long type.

Additionally we have certain boolean and constant type fields named 
visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).

The first field defaultSearch is a copyField which contains a copy of all the 
values of 6 text type fields that I have mentioned.

The way we query presently using the default search handler is like this.

defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
(visibleToEndUser:true)

We want to start using the dismax (if not dismax then edismax)  query handler 
but so far I have not been able to replicate the query mentioned above to the 
equivalent dismax form.

What I cannot figure out is?

1. How do I apply exact match on the group_id, visibleToEndUser and the 
entityType fields? Or How how do I query a specific field with a specific value 
rather than searching across all fields with all values.
2. How do I apply OR and AND conditions?


Swapnonil Mukherjee





Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Swapnonil Mukherjee
Hi Everybody,

Thanks Ephraim and Peter. I think I got my answer.

Swapnonil Mukherjee




On 26-Oct-2010, at 4:23 PM, Ephraim Ofir wrote:

 Note that usually when you change the schema.xml you have not only to
 restart solr, but also rebuild the index, so the issue of how to reload
 the file seems like a small problem...
 
 Ephraim Ofir
 
 -Original Message-
 From: Peter Karich [mailto:peat...@yahoo.de] 
 Sent: Tuesday, October 26, 2010 12:29 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Does Solr reload schema.xml dynamically?
 
  Hi,
 
 See this:
 http://wiki.apache.org/solr/CoreAdmin#RELOAD
 
 Solr will also load the new configuration (without restart the webapp) 
 on the slaves when using replication:
 http://wiki.apache.org/solr/SolrReplication
 
 Regards,
 Peter.
 
 Hi Everybody,
 
 If I change my schema.xml to, do I have to restart Solr. Is there some
 way, I can apply the changes to schema.xml without restarting Solr?
 
 Swapnonil Mukherjee
 
 
 
 
 
 
 -- 
 http://jetwick.com twitter search prototype
 



RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir
Try:
select IF(sub_cat_id=2002, DATE_FORMAT(ad_post_date,
'%Y-%m-%dT00:00:00Z/DAY'), null) as 'ad_sort_field' from
tcuser.ad_details where SOME_CONDiTION

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 1:29 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

My Sql is

select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from
tcuser.ad_details where SOME_CONDiTION

+---+
| ad_sort_field |
+---+
| 2010-05-30|
| 2010-05-02|
| 2010-10-07|
| NULL|
| 2010-10-15|
| NULL|
++

Thanks
Pawan


On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty g...@mimirtech.com
wrote:

 On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  I am using mysql database, and, field type is date
 [...]

 Could you show us the exact SELECT statement, and some example
 values returned by running the SELECT directly at a mysql console?

 Regards,
 Gora




-- 
Thanks,
Pawan Darira


Next Word - Any Suggestions?

2010-10-26 Thread Christopher Ball
Am about to implement a custom query that is sort of mash-up of Facets,
Highlighting, and SpanQuery - but thought I'd see if anyone has done
anything similar. 

 

In simple words, I need facet on the next word given a target word.

 

For example, if my index only had the following 5 documents (comprised of a
sentence each):

 

Doc 1 - The quick brown fox jumped over the fence.

Doc 2 - The sly fox skipped over the fence.

Doc 3 - The fat fox skipped his afternoon class.

Doc 4 - A brown duck and red fox, crashed the party.

Doc 5 - Charles Brown! Fox! Crashed my damn car.

 

The query should give the frequency of the distinct terms after the word
fox:

 

skipped - 2

crashed - 2 

jumped - 1

 

Long-term, do the opposite - frequency of the distinct terms before the word
fox:

 

brown - 2

sly - 1

fat - 1 

red - 1

 

My guess is that either the FastVectorHighlighter or SpanQuery would be a
reasonable starting point. I was hoping to take advantage of Vectors as I am
storing termVectors, termPositions, and termOffsets for the field in
question.

 

Grateful for any thoughts . . . reference implementations . . . words of
encouragement . . . free beer - whatever you can offer.

 

Gracias,

 

Christopher

 



How do I this in Solr?

2010-10-26 Thread Varun Gupta
Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only those
documents that satisfy this criteria All of the words of the search result
document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS, android,
samsung, samsung andriod, nokia andriod, mobile with GPS

If I search with the text samsung andriod GPS, search results should only
conain samsung, GPS, andriod and samsung andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta


Re: How do I this in Solr?

2010-10-26 Thread Savvas-Andreas Moysidis
If I get your question right, you probably want to use the AND binary
operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS

On 26 October 2010 14:07, Varun Gupta varun.vgu...@gmail.com wrote:

 Hi,

 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only those
 documents that satisfy this criteria All of the words of the search result
 document are present in the search query

 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS

 If I search with the text samsung andriod GPS, search results should only
 conain samsung, GPS, andriod and samsung andriod.

 Is there a way to do this in Solr.

 --
 Thanks
 Varun Gupta



RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Varun,

I can't think of a way to do it without writing new analysis filters.

But I think you could do what you want with two filters (this is untested):

1. An index-time filter that outputs a single token consisting of all of the 
input tokens, sorted in a consistent way, e.g.:

   mobile with GPS - GPS mobile with
   samsung android - android samsung

2. A query-time filter that outputs one token per input term combination, 
sorted in the same consistent way as the index-time filter, e.g.:

   samsung andriod GPS
 - samsung,android,GPS,
android samsung,GPS samsung,android GPS
android GPS samsung

Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in Solr?
 
 Hi,
 
 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only
 those
 documents that satisfy this criteria All of the words of the search
 result
 document are present in the search query
 
 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS
 
 If I search with the text samsung andriod GPS, search results should
 only
 conain samsung, GPS, andriod and samsung andriod.
 
 Is there a way to do this in Solr.
 
 --
 Thanks
 Varun Gupta


Re: Highlighting for non-stored fields

2010-10-26 Thread Israel Ekpo
Check out this link

http://wiki.apache.org/solr/FieldOptionsByUseCase

You need to store the field if you want to use the highlighting feature.

If you need to retrieve and display the highlighted snippets then the fields
definitely needs to be stored.

To use term offsets, it will be a good idea to enable the following
attributes for that field  termVectors termPositions termOffsets

The only issue here is that your storage costs will increase because of
these extra features.

Nevertheless, you definitely need to store the field if you need to retrieve
it for highlighting purposes.

On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com wrote:

 Hi,

 I've been looking thru the mailing archive for the past week and I haven't
 found any useful info regarding this issue.

 My requirement is to index a few terabytes worth of data to be searched.
 Due to the size of the data, I would like to index without storing but I
 would like to use the highlighting feature.  Is this even possible?  What
 are my options?

 I've read about termOffsets, payload that could possibly be used to do this
 but I have no idea how this could be done.

 Any pointers greatly appreciated.  Someone please point me in the right
 direction.

  I don't mind having to write some code or digging thru existing code to
 accomplish this task.

 Thanks,
 P.




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Documents are deleted when Solr is restarted

2010-10-26 Thread Mackram Raydan

Hey everyone,

I apologize if this question is rudimentary but it is getting to me and 
I did not find anything reasonable about it online.


So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the 
SolrTomcat wiki page to setup. The system works exactly the way I want 
it (proper search, highlighting, etc...). The problem however is when I 
restart my Tomcat server all the data in Solr (ie the index) is simply 
lost. The admin shows me the number of docs is 0 when it was before in 
the thousands.


Can someone please help me understand why the above is happening and how 
can I workaround it if possible?


Big thanks for any help you can send my way.

Regards,

Mackram


Re: a bug of solr distributed search

2010-10-26 Thread Ron Mayer
Andrzej Bialecki wrote:
 On 2010-10-25 11:22, Toke Eskildsen wrote:
 On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: 
 But itshows a problem of distrubted search without common idf.
 A doc will get different score in different shard.
 Bingo.

 I really don't understand why this fundamental problem with sharding
 isn't mentioned more often. Every time the advice use sharding is
 given, it should be followed with a but be aware that it will make
 relevance ranking unreliable.
 
 The reason is twofold, I think:


And a third potential reason - it's arguably a feature instead of a bug
for some applications.  Depending on how I organize my shards, give me
the most relevant document from each shard for this search seems like
it could be useful.

 * there is an exact solution to this problem, namely to make two
 distributed calls instead of one (first call to collect per-shard IDFs
 for given query terms, second call to submit a query rewritten with the
 global IDF-s). This solution is implemented in SOLR-1632, with some
 caching to reduce the cost for common queries. However, this means that
 now for every query you need to make two calls instead of one, which
 potentially doubles the time to return results (for simple common
 queries - for rare complex queries the time will be still dominated by
 the query runtime on shard servers).
 
 * another reason is that in many many cases the difference between using
 exact global IDF and per-shard IDFs is not that significant. If shards
 are more or less homogenous (e.g. you assign documents to shards by
 hash(docId)) then term distributions will be also similar. So then the
 question is whether you can accept an N% variance in scores across
 shards, or whether you want to bear the cost of an additional
 distributed RPC for every query...
 
 To summarize, I would qualify your statement with: ...if the
 composition of your shards is drastically different. Otherwise the cost
 of using global IDF is not worth it, IMHO.
 



Solr - xmlhttprequest

2010-10-26 Thread Yavuz Selim YILMAZ
I have a solr instance in my server, and I can make request with internet
explorer. However, with other browsers I can't.

Error given;
*XMLHttpRequest cannot load http://. Origin http://... is not allowed by
Access-Control-Allow-Origin.*

I changed my apache server conf file and added this lines;

Header set Access-Control-Allow-Origin *
Header set Access-Control-Allow-Methods POST,GET,OPTIONS
Header set Access-Control-Allow-Headers X-PINGOTHER
Header set Access-Control-Max-Age 1728000

to allow.

Still, the same error.

Any suggestion?
--

Yavuz Selim YILMAZ


Re: Solr ExtractingRequestHandler with Compressed files

2010-10-26 Thread Joey Hanzel
Hi Javendra,

Thanks for the suggestion, I updated to Solr 1.4.1 and Solr Cell 1.4.1 and
tried sending a zip file that contained several html documents.
Unfortunately, that did not solve the problem.

Here's the curl command I used:
curl 
http://localhost:8983/solr/update/extract?literla.id=d...@uprefix=attr_fmap.content=attri_contentcommit=true;
-F file=data.zip

When I query for id:doc1, the attr_content lists each filename within the
zip archive. It also indexed the stream_size, stream_source and
content_type.  It does not appear to be opening up the individual files
within the zip.

Did you have to make any other configuration changes to your solrconfig.xml
or schema.xml to read the contents of the individual files?  Would it help
to pass the specific mime type on the curl line ?

On Mon, Oct 25, 2010 at 3:27 PM, Jayendra Patil 
jayendra.patil@gmail.com wrote:

 There was this issue with the previous version of Solr, wherein only the
 file names from the zip used to get indexed.
 We had faced the same issue and ended up using the Solr trunk which has the
 Tika version upgraded and works fine.

 The Solr version 1.4.1 should also have the fix included. Try using it.

 Regards,
 Jayendra

 On Fri, Oct 22, 2010 at 6:02 PM, Joey Hanzel phan...@nearinfinity.com
 wrote:

  Hi,
 
  Has anyone had success using ExtractingRequestHandler and Tika with any
 of
  the compressed file formats (zip, tar, gz, etc) ?
 
  I am sending solr the archived.tar file using curl. curl 
 
 
 http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=body_textscommit=true
  
  -H 'Content-type:application/octet-stream' --data-binary
  @/home/archived.tar
  The result I get when I query the document is that the filenames inside
 the
  archive are indexed as the body_texts, but the content of those files
 is
  not extracted or included.  This is not the behvior I expected. Ref:
 
 
 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example
  .
  When I send 1 of the actual documents inside the archive using the same
  curl
  command the extracted content is then stored in the body_texts field.
  Am
  I missing a step for the compressed files?
 
  I have added all the extraction depednenices as indicated by mat in
  http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-celland
  am able to succesfully extract data from MS Word, PDF, HTML documents.
 
  I'm using the following library versions.
   Solr 1.40,  Solr Cell 1.4.1, with Tika Core 0.4
 
  Given everything I have read this version of Tika should support
 extracting
  data from all files within a compressed file.  Any help or suggestions
  would
  be appreciated.
 



Re: Query only a specfic field with a specific value using Dismax Handler

2010-10-26 Thread Jonathan Rochkind
So, first of all, exact match is hard in Solr on tokenized fields.  
Tokenized fields don't really do that.  So for exact match, you should 
probably use a non-tokenized field (string or text with keywordtokenizer 
(which should really be called the non-tokenizer)). If there's only one 
token in your value anyway though, like a single number, it may not 
matter and work fine.


Secondly, I'd recommend combining a dismax query for the user-entered 
phrase (like 'dog') with standard lucene queries for those other 
things.  There are (at least) two ways to do that. The first is just put 
everything after the first AND in one or more 'fq' parameters instead of 
trying to include them in 'q'.  The second is to use Solr's nested query 
syntax, to specify sub-queries with different query parsers. Someone can 
explain the second if you need it, but the easier to understand 'fq' 
approach seems right to me for your case.


Swapnonil Mukherjee wrote:

Hi Everybody,

Let me give you a brief idea of our Solr document. We have about 6 text type 
fields, each containing IPTC data extracted from photos. Search is performed 
mostly on these 6 fields.
We also have a mutlivalue field named group_id that contains a list of all the  
group_ids that have access to this photo.  In other words we are storing the 
metadata of the photo as well as the permissions applicable for this photo in 
the Solr document itself. This group_id field by the way is of long type.

Additionally we have certain boolean and constant type fields named 
visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).

The first field defaultSearch is a copyField which contains a copy of all the 
values of 6 text type fields that I have mentioned.

The way we query presently using the default search handler is like this.

defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
(visibleToEndUser:true)

We want to start using the dismax (if not dismax then edismax)  query handler 
but so far I have not been able to replicate the query mentioned above to the 
equivalent dismax form.

What I cannot figure out is?

1. How do I apply exact match on the group_id, visibleToEndUser and the 
entityType fields? Or How how do I query a specific field with a specific value 
rather than searching across all fields with all values.
2. How do I apply OR and AND conditions?


Swapnonil Mukherjee




  


Re: how well does multicore scale?

2010-10-26 Thread mike anderson
So I fired up about 100 cores and used JMeter to fire off a few thousand
queries. It looks like the memory usage isn't much worse than running a
single shard. So thats good.

I'm really curious if there is a clever solution to the obvious problem
with: So your better off using a single index and with a user id and use
a query filter with the user id when fetching data., i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..


Cheers,
Mike


On Fri, Oct 22, 2010 at 10:43 PM, Lance Norskog goks...@gmail.com wrote:

 http://wiki.apache.org/solr/CoreAdmin

 Since Solr 1.3

 On Fri, Oct 22, 2010 at 1:40 PM, mike anderson saidthero...@gmail.com
 wrote:
  Thanks for the advice, everyone. I'll take a look at the API mentioned
 and
  do some benchmarking over the weekend.
 
  -Mike
 
 
  On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
  On 10/22/10 1:44 AM, Tharindu Mathew wrote:
   Hi Mike,
  
   I've also considered using a separate cores in a multi tenant
   application, ie a separate core for each tenant/domain. But the cores
   do not suit that purpose.
  
   If you check out documentation no real API support exists for this so
   it can be done dynamically through SolrJ. And all use cases I found,
   only had users configuring it statically and then using it. That was
   maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
 
  You can dynamically manage cores with solrj. See
  org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
  for a place to start.
 
  You probably want to turn solr.xml's persist option on so that your
  cores survive restarts.
 
  
   So your better off using a single index and with a user id and use a
   query filter with the user id when fetching data.
 
  Many times this is probably the case - pro's and con's to each depending
  on what you are up to.
 
  - Mark
  lucidimagination.com
 
  
   On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu
  wrote:
   No, it does not seem reasonable.  Why do you think you need a
 seperate
  core
   for every user?
   mike anderson wrote:
  
   I'm exploring the possibility of using cores as a solution to
 bookmark
   folders in my solr application. This would mean I'll need tens of
   thousands
   of cores... does this seem reasonable? I have plenty of CPUs
 available
  for
   scaling, but I wonder about the memory overhead of adding cores
 (aside
   from
   needing to fit the new index in memory).
  
   Thoughts?
  
   -mike
  
  
  
  
  
  
 
 
 



 --
 Lance Norskog
 goks...@gmail.com



Re: How do I this in Solr?

2010-10-26 Thread Ken Stanley
On Tue, Oct 26, 2010 at 9:15 AM, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

 If I get your question right, you probably want to use the AND binary
 operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS


N.b. For these queries you can also pass the q.op parameter in the request
to temporarily change the default operator to AND; this has the same effect
without having to build the query; i.e., you can just pass
http://host:port/solr/select?q=samsung+android+gpsq.op=and;
as the query string (along with any other params you need).


Re: how well does multicore scale?

2010-10-26 Thread Jonathan Rochkind

mike anderson wrote:

I'm really curious if there is a clever solution to the obvious problem
with: So your better off using a single index and with a user id and use
a query filter with the user id when fetching data., i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..
  
Actually, I think that design would scale pretty fine, I don't think 
there's an 'obvious' problem. You store your userIDs in a multi-valued 
field (or as multiple terms in a single value, ends up being similar). 
You fq on there with the current userID.   There's one way to find out 
of course, but that doesn't seem a patently ridiculous scenario or 
anything, that's the kind of thing Solr is generally good at, it's what 
it's built for.   The problem might actually be in the time it takes to 
add such a document to the index; but not in query time.


Doesn't mean it's the best solution for your problem though, I can't say.

My impression is that Solr in general isn't really designed to support 
the kind of multi-tenancy use case people are talking about lately.  So 
trying to make it work anyway... if multi-cores work for you, then 
great, but be aware they weren't really designed for that (having 
thousands of cores) and may not. If a single index can work for you 
instead, great, but as you've discovered it's not neccesarily obvious 
how to set up the schema to do what you need -- really this applies to 
Solr in general, unlike an rdbms where you just third-form-normalize 
everything and figure it'll work for almost any use case that comes up,  
in Solr you generally need to custom fit the schema for your particular 
use cases, sometimes being kind of clever to figure out the optimal way 
to do that.


This is, I'd argue/agree, indeed kind of a disadvantage, setting up a 
Solr index takes more intellectual work than setting up an rdbms. The 
trade off is you get speed, and flexible ways to set up relevancy (that 
still perform well). Took a couple decades for rdbms to get as brainless 
to use as they are, maybe in a couple more we'll have figured out ways 
to make indexing engines like solr equally brainless, but not yet -- but 
it's still pretty damn easy for what it is, the lucene/Solr folks have 
done a remarkable job.


Re: Query only a specfic field with a specific value using Dismax Handler

2010-10-26 Thread Swapnonil Mukherjee
Thanks Jonathan. FQ seems promising. I will give it a go.

Swapnonil Mukherjee




On 26-Oct-2010, at 7:29 PM, Jonathan Rochkind wrote:

 So, first of all, exact match is hard in Solr on tokenized fields.  
 Tokenized fields don't really do that.  So for exact match, you should 
 probably use a non-tokenized field (string or text with keywordtokenizer 
 (which should really be called the non-tokenizer)). If there's only one 
 token in your value anyway though, like a single number, it may not 
 matter and work fine.
 
 Secondly, I'd recommend combining a dismax query for the user-entered 
 phrase (like 'dog') with standard lucene queries for those other 
 things.  There are (at least) two ways to do that. The first is just put 
 everything after the first AND in one or more 'fq' parameters instead of 
 trying to include them in 'q'.  The second is to use Solr's nested query 
 syntax, to specify sub-queries with different query parsers. Someone can 
 explain the second if you need it, but the easier to understand 'fq' 
 approach seems right to me for your case.
 
 Swapnonil Mukherjee wrote:
 Hi Everybody,
 
 Let me give you a brief idea of our Solr document. We have about 6 text type 
 fields, each containing IPTC data extracted from photos. Search is performed 
 mostly on these 6 fields.
 We also have a mutlivalue field named group_id that contains a list of all 
 the  group_ids that have access to this photo.  In other words we are 
 storing the metadata of the photo as well as the permissions applicable for 
 this photo in the Solr document itself. This group_id field by the way is of 
 long type.
 
 Additionally we have certain boolean and constant type fields named 
 visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).
 
 The first field defaultSearch is a copyField which contains a copy of all 
 the values of 6 text type fields that I have mentioned.
 
 The way we query presently using the default search handler is like this.
 
 defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
 group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
 (visibleToEndUser:true)
 
 We want to start using the dismax (if not dismax then edismax)  query 
 handler but so far I have not been able to replicate the query mentioned 
 above to the equivalent dismax form.
 
 What I cannot figure out is?
 
 1. How do I apply exact match on the group_id, visibleToEndUser and the 
 entityType fields? Or How how do I query a specific field with a specific 
 value rather than searching across all fields with all values.
 2. How do I apply OR and AND conditions?
 
 
 Swapnonil Mukherjee
 
 
 
 
 



Re: Documents are deleted when Solr is restarted

2010-10-26 Thread Upayavira
You need to watch what you are setting your solr.home to. That is where
your indexes are being written. Are they getting overwritten/lost
somehow. Watch the files in that dir while doing a restart.

That's a start at least.

Upayavira

On Tue, 26 Oct 2010 16:40 +0300, Mackram Raydan mack...@gmail.com
wrote:
 Hey everyone,
 
 I apologize if this question is rudimentary but it is getting to me and 
 I did not find anything reasonable about it online.
 
 So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the 
 SolrTomcat wiki page to setup. The system works exactly the way I want 
 it (proper search, highlighting, etc...). The problem however is when I 
 restart my Tomcat server all the data in Solr (ie the index) is simply 
 lost. The admin shows me the number of docs is 0 when it was before in 
 the thousands.
 
 Can someone please help me understand why the above is happening and how 
 can I workaround it if possible?
 
 Big thanks for any help you can send my way.
 
 Regards,
 
 Mackram
 


Re: Documents are deleted when Solr is restarted

2010-10-26 Thread Israel Ekpo
The Solr home is the -Dsolr.solr.home Java System property

Also make sure that -Dsolr.data.dir is define for your data directory, if it
is not already defined in the solrconfig.xml file

On Tue, Oct 26, 2010 at 10:46 AM, Upayavira u...@odoko.co.uk wrote:

 You need to watch what you are setting your solr.home to. That is where
 your indexes are being written. Are they getting overwritten/lost
 somehow. Watch the files in that dir while doing a restart.

 That's a start at least.

 Upayavira

 On Tue, 26 Oct 2010 16:40 +0300, Mackram Raydan mack...@gmail.com
 wrote:
  Hey everyone,
 
  I apologize if this question is rudimentary but it is getting to me and
  I did not find anything reasonable about it online.
 
  So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the
  SolrTomcat wiki page to setup. The system works exactly the way I want
  it (proper search, highlighting, etc...). The problem however is when I
  restart my Tomcat server all the data in Solr (ie the index) is simply
  lost. The admin shows me the number of docs is 0 when it was before in
  the thousands.
 
  Can someone please help me understand why the above is happening and how
  can I workaround it if possible?
 
  Big thanks for any help you can send my way.
 
  Regards,
 
  Mackram
 




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Highlighting for non-stored fields

2010-10-26 Thread Phong Dais
Hi,

I understand that I need to store the fields in order to use highlighting
out of the box.
I'm looking for a way to highlighting using term offsets instead of the
actual text since the text is not stored.  What am asking is is it possible
to modify the response (thru custom implementation) to contain highlighted
offsets instead of the actual matched text.  Should I be writing my own
DefaultHighlighter?  Or overiding some of its functionality?  Can this be
done this way or am I way off?

BTW, I'm using solr-1.4.

Thanks,
P.

On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo israele...@gmail.com wrote:

 Check out this link

 http://wiki.apache.org/solr/FieldOptionsByUseCase

 You need to store the field if you want to use the highlighting feature.

 If you need to retrieve and display the highlighted snippets then the
 fields
 definitely needs to be stored.

 To use term offsets, it will be a good idea to enable the following
 attributes for that field  termVectors termPositions termOffsets

 The only issue here is that your storage costs will increase because of
 these extra features.

 Nevertheless, you definitely need to store the field if you need to
 retrieve
 it for highlighting purposes.

 On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com wrote:

  Hi,
 
  I've been looking thru the mailing archive for the past week and I
 haven't
  found any useful info regarding this issue.
 
  My requirement is to index a few terabytes worth of data to be searched.
  Due to the size of the data, I would like to index without storing but I
  would like to use the highlighting feature.  Is this even possible?  What
  are my options?
 
  I've read about termOffsets, payload that could possibly be used to do
 this
  but I have no idea how this could be done.
 
  Any pointers greatly appreciated.  Someone please point me in the right
  direction.
 
   I don't mind having to write some code or digging thru existing code to
  accomplish this task.
 
  Thanks,
  P.
 



 --
 °O°
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/



Inconsistent slave performance after optimize

2010-10-26 Thread Mason Hale
Hello esteemed Solr community --

I'm observing some inconsistent performance on our slave servers after
recently optimizing our master server.

Our configuration is as follows:

- all servers are hosted at Amazon EC2, running Ubuntu 8.04
- 1 master with heavy insert/update traffic, about 125K new documents
per day (m1.large, ~8GB RAM)
   - autocommit every 1 minute
- 3 slaves (m2.xlarge instance sizes, ~16GB RAM)
   - replicate every 5 minutes
   - we have configured autowarming queries for these machines
   - autowarmCount = 0
- Total index size is ~7M documents

We were seeing increasing, but gradual performance degradation across all
nodes.
So we decided to try optimizing our index to improve performance.

In preparation for the optimize we disabled replication polling on all
slaves. We also turned off all
workers that were writing to the index. Then we ran optimize on the master.

The optimize took 45-60 minutes to complete, and the total size went from
68GB down to 23GB.

We then enabled replication on each slave one at a time.

The first slave we re-enabled took about 15 minutes to copy the new files.
Once the files were copied
the performance of slave plummeted. Average response time went from 0.75 sec
to 45 seconds.
Over the past 18 hours the average response time has gradually gown down to
around 1.2 seconds now.

Before re-enabling replication the second slave, we first removed it from
our load-balanced pool of available search servers.
This server's average query performance also degraded quickly, and then
(unlike the first slave we replicated) did not improve.
It stayed at around 30 secs per query. On the theory that this is a
cache-warming issue, we added this server
back to the pool in hopes that additional traffic would warm the cache. But
what we saw was a quick spike of much worse
performance (50 sec / query on average) followed by a slow/gradual decline
in average response times.
As of now (10 hours after the initial replication) this server is still
reporting an average response time of ~2 seconds.
This is much worse than before the optimize and is a counter-intuitive
result. We expected an index 1/3 the size would be faster, not slower.

On the theory that the index files needed to be loaded into the file system
cache, I used the 'dd' command to copy
the contents of the data/index directory to /dev/null, but that did not
result in any noticeable performance improvement.

At this point, things were not going as expected. We did not expect the
replication after an optimize to result in such horrid
performance. So we decided to let the last slave continue to serve stale
results while we waited 4 hours for the
other two slaves to approach some acceptable performance level.

After the 4 hour break, we re-moved the 3rd and last slave server from our
load-balancing pool, then re-enabled replication.
This time we saw a tiny blip. The average performance went up to 1 second
briefly then went back to the (normal for us)
0.25 to 0.5 second range. We then added this server back to the
load-balancing pool and observed no degradation in performance.

While we were happy to avoid a repeat of the poor performance we saw on the
previous slaves, we are at a loss to explain
why this slave did not also have such poor performance.

At this point we're scratching our heads trying to understand:
   (a) Why the performance of the first two slaves was so terrible after the
optimize. We think its cache-warming related, but we're not sure.
  10 hours seems like a long time to wait for the cache to warm up
   (b) Why the performance of the third slave was barely impacted. It should
have hit the same cold-cache issues as the other servers, if that is indeed
the root cause.
   (c) Why performance of the first 2 slaves is still much worse after the
optimize than it was before the optimize,
  where the performance of the 3rd slave is pretty much unchanged. We
expected the optimize to *improve* performance.

All 3 slave servers are identically configured, and the procedure for
re-enabling replication was identical for the 2nd and 3rd
slaves, with the exception of a 4-hour wait period.

We have confirmed that the 3rd slave did replicate, the number of documents
and total index size matches the master and other slave servers.

I'm writing to fish for an explanation or ideas that might explain this
inconsistent performance. Obviously, we'd like to be able to reproduce the
performance of the 3rd slave, and avoid the poor performance of the first
two slaves the next time we decide it's time to optimize our index.

thanks in advance,

Mason


After java replication: field not found exception on slaves

2010-10-26 Thread Peter Karich

Hi,

we had the following problem. We added a field to schema.xml and fed our 
master with the new data.
After that querying on the master is fine. But when we replicated 
(solr1.4.0) to our slaves.
All slaves said they cannot find the new field (standard exception for 
missing fields).
And that although I can see the new field in the xml response and I can 
see it in the replicated schema.xml file!?


It is more strange that with scp-ing the exact data folder to our master 
all is fine (on the master).


Did somebody of you hit the same strange behaviour?

Regards,
Peter.


PS: Finally  we did on the slaves:
rm -rf data/
./reload.sh + replicated again


Re: Strange search

2010-10-26 Thread ramzesua

Can anyone tell my, why my search is so terrible? It's work realy strange.
Here my basic configs in schema.xml:
main filters:
fieldType name=text_rev class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory
withOriginal=true
   maxPosAsterisk=3 maxPosQuestion=2
maxFractionAsterisk=0.33/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


and fields:

field name=productId type=int indexed=true stored=true
multiValued=true/
   field name=categoryId type=int indexed=true stored=true
multiValued=true /
   field name=templateId type=int indexed=true stored=true
required=true /
   
   field name=templateSetName type=text indexed=true stored=false
/
   field name=templateSetCaption type=text indexed=true
stored=false /
   field name=templateSetDeleted type=int indexed=true stored=false
default=0/
   field name=templateSetDateCreate type=string indexed=true
stored=false /
   field name=templateSetPopularity type=float indexed=true
stored=false default=0/
   field name=templateSetText type=text indexed=true stored=false
multiValued=true /

   field name=typeName type=string indexed=true stored=false
multiValued=true/
   field name=typeCaption type=text indexed=true stored=false
multiValued=true/   

   field name=themeName type=string indexed=true stored=false /
   field name=themeCaption type=text indexed=true stored=false /
   field name=themeText type=text indexed=true stored=false /
   field name=text type=text indexed=true stored=false
multiValued=true/

uniqueKeytemplateId/uniqueKey

 defaultSearchFieldtext/defaultSearchField

 solrQueryParser defaultOperator=OR/

copyField source=templateSetName dest=text/
copyField source=templateSetCaption dest=text/
copyField source=typeName dest=text/
copyField source=typeCaption dest=text/
copyField source=themeName dest=text/
copyField source=themeCaption dest=text/
copyField source=themeText dest=text/

here schema for field typeCaption from
_http://localhost:8983/search/admin/schema.jsp;
html4
page4
template4
text4
main4
seo 3
meta2
tags1
keywords1

If I search html, I get all results, but if I search seo or text I
don't get any results. I try to use wildcard, but it don't help me. Can
anyone say, where is my problem. Sorry for my not well english.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1773307.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Overkill?

Dennis Gearon
 
 I can't think of a way to do it without writing new
 analysis filters.
 
 But I think you could do what you want with two filters
 (this is untested):
 
 1. An index-time filter that outputs a single token
 consisting of all of the input tokens, sorted in a
 consistent way, e.g.:
 
    mobile with GPS - GPS mobile
 with
    samsung android - android
 samsung
 
 2. A query-time filter that outputs one token per input
 term combination, sorted in the same consistent way as the
 index-time filter, e.g.:
 
    samsung andriod GPS
  -   
 samsung,android,GPS,
         android
 samsung,GPS samsung,android GPS
         android GPS
 samsung
 
 Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in Solr?
  
  Hi,
  
  I have lot of small documents (each containing 1 to 15
 words) indexed in
  Solr. For the search query, I want the search results
 to contain only
  those
  documents that satisfy this criteria All of the words
 of the search
  result
  document are present in the search query
  
  For example:
  If I have the following documents indexed: nokia
 n95, GPS, android,
  samsung, samsung andriod, nokia andriod, mobile
 with GPS
  
  If I search with the text samsung andriod GPS,
 search results should
  only
  conain samsung, GPS, andriod and samsung
 andriod.
  
  Is there a way to do this in Solr.
  
  --
  Thanks
  Varun Gupta



Re: Modelling Access Control

2010-10-26 Thread Dennis Gearon
Son, don't touch that stove . . . .,

OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me 
that?!?#! You know I need to know WHY, not just DON'T!

Dennis Gearon

 Very important: do not make a spelling or autosuggest index
 from a
 text field which some people can see and other people
 can't.
 



Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall

Um.. you could change your default clause to AND rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis Gearon wrote:

Overkill?

Dennis Gearon

I can't think of a way to do it without writing new
analysis filters.

But I think you could do what you want with two filters
(this is untested):

1. An index-time filter that outputs a single token
consisting of all of the input tokens, sorted in a
consistent way, e.g.:

mobile with GPS -  GPS mobile
with
samsung android -  android
samsung

2. A query-time filter that outputs one token per input
term combination, sorted in the same consistent way as the
index-time filter, e.g.:

samsung andriod GPS
  -
samsung,android,GPS,

 android
samsung,GPS samsung,android GPS
 android GPS
samsung

Steve


-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 9:08 AM
To: solr-user@lucene.apache.org
Subject: How do I this in Solr?

Hi,

I have lot of small documents (each containing 1 to 15

words) indexed in

Solr. For the search query, I want the search results

to contain only

those
documents that satisfy this criteria All of the words

of the search

result
document are present in the search query

For example:
If I have the following documents indexed: nokia

n95, GPS, android,

samsung, samsung andriod, nokia andriod, mobile

with GPS

If I search with the text samsung andriod GPS,

search results should

only
conain samsung, GPS, andriod and samsung

andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta




Re: Highlighting for non-stored fields

2010-10-26 Thread Pradeep Singh
Another way you can do this is - after the search has completed, load the
field in your application, write separate code to reanalyze that
field/document, index it in RAM, and run it through highlighter classes. All
this as part of your web application outside of Solr. Considering the size
of your data it doesn't look advisable to store it because then you would be
almost doubling the size of your index (if you are looking to highlight on a
field then it's probably going to be full of content).

-Pradeep

On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais phong.gd...@gmail.com wrote:

 Hi,

 I understand that I need to store the fields in order to use highlighting
 out of the box.
 I'm looking for a way to highlighting using term offsets instead of the
 actual text since the text is not stored.  What am asking is is it possible
 to modify the response (thru custom implementation) to contain highlighted
 offsets instead of the actual matched text.  Should I be writing my own
 DefaultHighlighter?  Or overiding some of its functionality?  Can this be
 done this way or am I way off?

 BTW, I'm using solr-1.4.

 Thanks,
 P.

 On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo israele...@gmail.com wrote:

  Check out this link
 
  http://wiki.apache.org/solr/FieldOptionsByUseCase
 
  You need to store the field if you want to use the highlighting feature.
 
  If you need to retrieve and display the highlighted snippets then the
  fields
  definitely needs to be stored.
 
  To use term offsets, it will be a good idea to enable the following
  attributes for that field  termVectors termPositions termOffsets
 
  The only issue here is that your storage costs will increase because of
  these extra features.
 
  Nevertheless, you definitely need to store the field if you need to
  retrieve
  it for highlighting purposes.
 
  On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com
 wrote:
 
   Hi,
  
   I've been looking thru the mailing archive for the past week and I
  haven't
   found any useful info regarding this issue.
  
   My requirement is to index a few terabytes worth of data to be
 searched.
   Due to the size of the data, I would like to index without storing but
 I
   would like to use the highlighting feature.  Is this even possible?
  What
   are my options?
  
   I've read about termOffsets, payload that could possibly be used to do
  this
   but I have no idea how this could be done.
  
   Any pointers greatly appreciated.  Someone please point me in the right
   direction.
  
I don't mind having to write some code or digging thru existing code
 to
   accomplish this task.
  
   Thanks,
   P.
  
 
 
 
  --
  °O°
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/
 



RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Um, maybe I'm way off base, but when Varun said:

 If I search with the text samsung andriod GPS,
 search results should only conain samsung, GPS,
 andriod and samsung andriod.

I interpreted that to mean that hit documents should contain terms from the 
query, and nothing else.  Making all terms required doesn't do this.

Steve

 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 2:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?
 
 Um.. you could change your default clause to AND rather than or.
 
 That should do the trick.
 
 Matt
 
 On 10/26/2010 2:26 PM, Dennis Gearon wrote:
  Overkill?
 
  Dennis Gearon
  I can't think of a way to do it without writing new
  analysis filters.
 
  But I think you could do what you want with two filters
  (this is untested):
 
  1. An index-time filter that outputs a single token
  consisting of all of the input tokens, sorted in a
  consistent way, e.g.:
 
  mobile with GPS -  GPS mobile
  with
  samsung android -  android
  samsung
 
  2. A query-time filter that outputs one token per input
  term combination, sorted in the same consistent way as the
  index-time filter, e.g.:
 
  samsung andriod GPS
-
  samsung,android,GPS,
   android
  samsung,GPS samsung,android GPS
   android GPS
  samsung
 
  Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in Solr?
 
  Hi,
 
  I have lot of small documents (each containing 1 to 15
  words) indexed in
  Solr. For the search query, I want the search results
  to contain only
  those
  documents that satisfy this criteria All of the words
  of the search
  result
  document are present in the search query
 
  For example:
  If I have the following documents indexed: nokia
  n95, GPS, android,
  samsung, samsung andriod, nokia andriod, mobile
  with GPS
  If I search with the text samsung andriod GPS,
  search results should
  only
  conain samsung, GPS, andriod and samsung
  andriod.
  Is there a way to do this in Solr.
 
  --
  Thanks
  Varun Gupta



Re: ClassCastException Issue

2010-10-26 Thread Chris Hostetter

: [ERROR][http-4443-exec-3][util.plugin.AbstractPluginLoader] log():139
: java.lang.ClassCastException: org.apache.solr.schema.StrField cannot
: be cast to org.apache.solr.schema.FieldType

This almost certainly inidcates a classloader issue - i suspect you have 
multiple solr related jars in various places, and the FieldType class 
instance found when StrField is loaded comes from a different 
(incompatible) jar.


-Hoss


RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Good point. Since I might need such a query myself someday, how *IS* that done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 11:46 AM
 Um, maybe I'm way off base, but when
 Varun said:
 
  If I search with the text samsung andriod GPS,
  search results should only conain samsung, GPS,
  andriod and samsung andriod.
 
 I interpreted that to mean that hit documents should
 contain terms from the query, and nothing else.  Making
 all terms required doesn't do this.
 
 Steve
 
  -Original Message-
  From: Matthew Hall [mailto:mh...@informatics.jax.org]
  Sent: Tuesday, October 26, 2010 2:30 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in Solr?
  
  Um.. you could change your default clause to AND
 rather than or.
  
  That should do the trick.
  
  Matt
  
  On 10/26/2010 2:26 PM, Dennis Gearon wrote:
   Overkill?
  
   Dennis Gearon
   I can't think of a way to do it without
 writing new
   analysis filters.
  
   But I think you could do what you want with
 two filters
   (this is untested):
  
   1. An index-time filter that outputs a single
 token
   consisting of all of the input tokens, sorted
 in a
   consistent way, e.g.:
  
       mobile with GPS
 -  GPS mobile
   with
       samsung android
 -  android
   samsung
  
   2. A query-time filter that outputs one token
 per input
   term combination, sorted in the same
 consistent way as the
   index-time filter, e.g.:
  
       samsung andriod
 GPS
         -
   samsung,android,GPS,
            android
   samsung,GPS samsung,android GPS
            android
 GPS
   samsung
  
   Steve
  
   -Original Message-
   From: Varun Gupta [mailto:varun.vgu...@gmail.com]
   Sent: Tuesday, October 26, 2010 9:08 AM
   To: solr-user@lucene.apache.org
   Subject: How do I this in Solr?
  
   Hi,
  
   I have lot of small documents (each
 containing 1 to 15
   words) indexed in
   Solr. For the search query, I want the
 search results
   to contain only
   those
   documents that satisfy this criteria All
 of the words
   of the search
   result
   document are present in the search
 query
  
   For example:
   If I have the following documents
 indexed: nokia
   n95, GPS, android,
   samsung, samsung andriod, nokia
 andriod, mobile
   with GPS
   If I search with the text samsung
 andriod GPS,
   search results should
   only
   conain samsung, GPS, andriod and
 samsung
   andriod.
   Is there a way to do this in Solr.
  
   --
   Thanks
   Varun Gupta
 



RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Dennis,

Do you mean to say that you read my earlier post, and disagree that it would 
solve the problem?  Or have you simply not read it?

Steve

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:00 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 Good point. Since I might need such a query myself someday, how *IS* that
 done?
 
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 11:46 AM
  Um, maybe I'm way off base, but when
  Varun said:
 
   If I search with the text samsung andriod GPS,
   search results should only conain samsung, GPS,
   andriod and samsung andriod.
 
  I interpreted that to mean that hit documents should
  contain terms from the query, and nothing else.  Making
  all terms required doesn't do this.
 
  Steve
 
   -Original Message-
   From: Matthew Hall [mailto:mh...@informatics.jax.org]
   Sent: Tuesday, October 26, 2010 2:30 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How do I this in Solr?
  
   Um.. you could change your default clause to AND
  rather than or.
  
   That should do the trick.
  
   Matt
  
   On 10/26/2010 2:26 PM, Dennis Gearon wrote:
Overkill?
   
Dennis Gearon
I can't think of a way to do it without
  writing new
analysis filters.
   
But I think you could do what you want with
  two filters
(this is untested):
   
1. An index-time filter that outputs a single
  token
consisting of all of the input tokens, sorted
  in a
consistent way, e.g.:
   
        mobile with GPS
  -  GPS mobile
with
        samsung android
  -  android
samsung
   
2. A query-time filter that outputs one token
  per input
term combination, sorted in the same
  consistent way as the
index-time filter, e.g.:
   
        samsung andriod
  GPS
          -
samsung,android,GPS,
             android
samsung,GPS samsung,android GPS
             android
  GPS
samsung
   
Steve
   
-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 9:08 AM
To: solr-user@lucene.apache.org
Subject: How do I this in Solr?
   
Hi,
   
I have lot of small documents (each
  containing 1 to 15
words) indexed in
Solr. For the search query, I want the
  search results
to contain only
those
documents that satisfy this criteria All
  of the words
of the search
result
document are present in the search
  query
   
For example:
If I have the following documents
  indexed: nokia
n95, GPS, android,
samsung, samsung andriod, nokia
  andriod, mobile
with GPS
If I search with the text samsung
  andriod GPS,
search results should
only
conain samsung, GPS, andriod and
  samsung
andriod.
Is there a way to do this in Solr.
   
--
Thanks
Varun Gupta
 
 


RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
If Solr is like Google, once documents matching only the ANDed items in the 
query ran out, then those that had only two of the terms, then only 1 of the 
terms, and then those close to it would start showing up.

Is this correct?

If so, it wouldn't match his requirements.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:10 PM
 Dennis,
 
 Do you mean to say that you read my earlier post, and
 disagree that it would solve the problem?  Or have you
 simply not read it?
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:00 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  Good point. Since I might need such a query myself
 someday, how *IS* that
  done?
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 11:46 AM
   Um, maybe I'm way off base, but when
   Varun said:
  
If I search with the text samsung andriod
 GPS,
search results should only conain samsung,
 GPS,
andriod and samsung andriod.
  
   I interpreted that to mean that hit documents
 should
   contain terms from the query, and nothing else. 
 Making
   all terms required doesn't do this.
  
   Steve
  
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010 2:30 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?
   
Um.. you could change your default clause to
 AND
   rather than or.
   
That should do the trick.
   
Matt
   
On 10/26/2010 2:26 PM, Dennis Gearon wrote:
 Overkill?

 Dennis Gearon
 I can't think of a way to do it
 without
   writing new
 analysis filters.

 But I think you could do what you
 want with
   two filters
 (this is untested):

 1. An index-time filter that
 outputs a single
   token
 consisting of all of the input
 tokens, sorted
   in a
 consistent way, e.g.:

     mobile with GPS
   -  GPS mobile
 with
     samsung android
   -  android
 samsung

 2. A query-time filter that outputs
 one token
   per input
 term combination, sorted in the
 same
   consistent way as the
 index-time filter, e.g.:

     samsung andriod
   GPS
       -
 samsung,android,GPS,
          android
 samsung,GPS samsung,android
 GPS
          android
   GPS
 samsung

 Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010
 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in
 Solr?

 Hi,

 I have lot of small documents
 (each
   containing 1 to 15
 words) indexed in
 Solr. For the search query, I
 want the
   search results
 to contain only
 those
 documents that satisfy this
 criteria All
   of the words
 of the search
 result
 document are present in the
 search
   query

 For example:
 If I have the following
 documents
   indexed: nokia
 n95, GPS, android,
 samsung, samsung andriod,
 nokia
   andriod, mobile
 with GPS
 If I search with the text
 samsung
   andriod GPS,
 search results should
 only
 conain samsung, GPS,
 andriod and
   samsung
 andriod.
 Is there a way to do this in
 Solr.

 --
 Thanks
 Varun Gupta
  
  



RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Plus, if he wants terms that contain ONLY those words, and no others, an ANDed 
query would not do that, right? ANDed queries return results that must have ALL 
the terms listed, and could have lots of other words, right?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:10 PM
 Dennis,
 
 Do you mean to say that you read my earlier post, and
 disagree that it would solve the problem?  Or have you
 simply not read it?
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:00 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  Good point. Since I might need such a query myself
 someday, how *IS* that
  done?
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 11:46 AM
   Um, maybe I'm way off base, but when
   Varun said:
  
If I search with the text samsung andriod
 GPS,
search results should only conain samsung,
 GPS,
andriod and samsung andriod.
  
   I interpreted that to mean that hit documents
 should
   contain terms from the query, and nothing else. 
 Making
   all terms required doesn't do this.
  
   Steve
  
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010 2:30 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?
   
Um.. you could change your default clause to
 AND
   rather than or.
   
That should do the trick.
   
Matt
   
On 10/26/2010 2:26 PM, Dennis Gearon wrote:
 Overkill?

 Dennis Gearon
 I can't think of a way to do it
 without
   writing new
 analysis filters.

 But I think you could do what you
 want with
   two filters
 (this is untested):

 1. An index-time filter that
 outputs a single
   token
 consisting of all of the input
 tokens, sorted
   in a
 consistent way, e.g.:

     mobile with GPS
   -  GPS mobile
 with
     samsung android
   -  android
 samsung

 2. A query-time filter that outputs
 one token
   per input
 term combination, sorted in the
 same
   consistent way as the
 index-time filter, e.g.:

     samsung andriod
   GPS
       -
 samsung,android,GPS,
          android
 samsung,GPS samsung,android
 GPS
          android
   GPS
 samsung

 Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010
 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in
 Solr?

 Hi,

 I have lot of small documents
 (each
   containing 1 to 15
 words) indexed in
 Solr. For the search query, I
 want the
   search results
 to contain only
 those
 documents that satisfy this
 criteria All
   of the words
 of the search
 result
 document are present in the
 search
   query

 For example:
 If I have the following
 documents
   indexed: nokia
 n95, GPS, android,
 samsung, samsung andriod,
 nokia
   andriod, mobile
 with GPS
 If I search with the text
 samsung
   andriod GPS,
 search results should
 only
 conain samsung, GPS,
 andriod and
   samsung
 andriod.
 Is there a way to do this in
 Solr.

 --
 Thanks
 Varun Gupta
  
  



How does DIH multithreading work?

2010-10-26 Thread markwaddle

I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Dennis,

You wrote:
 If Solr is like Google, once documents matching only the ANDed items
 in the query ran out, then those that had only two of the terms, then
 only 1 of the terms, and then those close to it would start showing up.
[...]
 Plus, if he wants terms that contain ONLY those words, and no others, an
 ANDed query would not do that, right? ANDed queries return results that
 must have ALL the terms listed, and could have lots of other words, right?

This is *exactly* what I just said: ANDed queries (i.e., requiring all query 
terms) will not satisfy Varun's requirements.

Your participation in this thread looks an awful lot like flame-bating: Someone 
else asks a question, I answer with a possible solution, you give a one-word 
overkill response, I say why it's not overkill.  You then ask if anybody 
knows the answer to the original question, and then parrot my response to your 
overkill statement.  Really

Get your shit together or shut up.  Please.

Steve

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:14 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:10 PM
  Dennis,
 
  Do you mean to say that you read my earlier post, and
  disagree that it would solve the problem?  Or have you
  simply not read it?
 
  Steve
 
   -Original Message-
   From: Dennis Gearon [mailto:gear...@sbcglobal.net]
   Sent: Tuesday, October 26, 2010 3:00 PM
   To: solr-user@lucene.apache.org
   Subject: RE: How do I this in Solr?
  
   Good point. Since I might need such a query myself
  someday, how *IS* that
   done?
  
  
   Dennis Gearon
  
   Signature Warning
   
   It is always a good idea to learn from your own
  mistakes. It is usually a
   better idea to learn from others’ mistakes, so you
  do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  
   EARTH has a Right To Life,
     otherwise we all die.
  
  
   --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
  wrote:
  
From: Steven A Rowe sar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 11:46 AM
Um, maybe I'm way off base, but when
Varun said:
   
 If I search with the text samsung andriod
  GPS,
 search results should only conain samsung,
  GPS,
 andriod and samsung andriod.
   
I interpreted that to mean that hit documents
  should
contain terms from the query, and nothing else.
  Making
all terms required doesn't do this.
   
Steve
   
 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 2:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?

 Um.. you could change your default clause to
  AND
rather than or.

 That should do the trick.

 Matt

 On 10/26/2010 2:26 PM, Dennis Gearon wrote:
  Overkill?
 
  Dennis Gearon
  I can't think of a way to do it
  without
writing new
  analysis filters.
 
  But I think you could do what you
  want with
two filters
  (this is untested):
 
  1. An index-time filter that
  outputs a single
token
  consisting of all of the input
  tokens, sorted
in a
  consistent way, e.g.:
 
      mobile with GPS
-  GPS mobile
  with
      samsung android
-  android
  samsung
 
  2. A query-time filter that outputs
  one token
per input
  term combination, sorted in the
  same
consistent way as the
  index-time filter, e.g.:
 
      samsung andriod
GPS
        -
  samsung,android,GPS,
           android
  samsung,GPS samsung,android
  GPS
           android
GPS
  samsung
 
  Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010
  9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in
  Solr?
 
  Hi,
 
  I have lot of small documents
  (each
containing 1 to 15
  words) indexed in
  Solr. For the search query, I
  want the
search results
  to contain only
  those
  documents that 

RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:27 PM
 Hi Dennis,
 
 You wrote:
  If Solr is like Google, once documents matching only
 the ANDed items
  in the query ran out, then those that had only two of
 the terms, then
  only 1 of the terms, and then those close to it would
 start showing up.
 [...]
  Plus, if he wants terms that contain ONLY those words,
 and no others, an
  ANDed query would not do that, right? ANDed queries
 return results that
  must have ALL the terms listed, and could have lots of
 other words, right?
 
 This is *exactly* what I just said: ANDed queries (i.e.,
 requiring all query terms) will not satisfy Varun's
 requirements.
 
 Your participation in this thread looks an awful lot like
 flame-bating: Someone else asks a question, I answer with a
 possible solution, you give a one-word overkill response,
 I say why it's not overkill.  You then ask if anybody
 knows the answer to the original question, and then parrot
 my response to your overkill statement.  Really
 
 Get your shit together or shut up.  Please.
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:14 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 12:10 PM
   Dennis,
  
   Do you mean to say that you read my earlier post,
 and
   disagree that it would solve the problem?  Or
 have you
   simply not read it?
  
   Steve
  
-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?
   
Good point. Since I might need such a query
 myself
   someday, how *IS* that
done?
   
   
Dennis Gearon
   
Signature Warning

It is always a good idea to learn from your
 own
   mistakes. It is usually a
better idea to learn from others’
 mistakes, so you
   do not have to make
them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
   
EARTH has a Right To Life,
      otherwise we all die.
   
   
--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
   wrote:
   
 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org
   solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 11:46
 AM
 Um, maybe I'm way off base, but when
 Varun said:

  If I search with the text samsung
 andriod
   GPS,
  search results should only conain
 samsung,
   GPS,
  andriod and samsung andriod.

 I interpreted that to mean that hit
 documents
   should
 contain terms from the query, and
 nothing else.
   Making
 all terms required doesn't do this.

 Steve

  -Original Message-
  From: Matthew Hall [mailto:mh...@informatics.jax.org]
  Sent: Tuesday, October 26, 2010
 2:30 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in
 Solr?
 
  Um.. you could change your default
 clause to
   AND
 rather than or.
 
  That should do the trick.
 
  Matt
 
  On 10/26/2010 2:26 PM, Dennis
 Gearon wrote:
   Overkill?
  
   Dennis Gearon
   I can't think of a way to
 do it
   without
 writing new
   analysis filters.
  
   But I think you could do
 what you
   want with
 two filters
   (this is untested):
  
   1. An index-time filter
 that
   outputs a single
 token
   consisting of all of the
 input
   tokens, sorted
 in a
   

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Dennis,

I wasn't trying to force your admission of my rectitude - I was just getting 
frustrated that the conversation was moving in spiral fashion, and was worried 
that you might have intentionally engineered that.

I'm glad to hear that you weren't flame baiting.

Steve


 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:35 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 I'm the LAST person anyone will ever need to worry about flame baiting.
 You did notice that I retracted what I said and supported your point of
 view?
 
 Sorry if my cryptic comment sounded critical. I was wrong, you were right
 :-)
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:27 PM
  Hi Dennis,
 
  You wrote:
   If Solr is like Google, once documents matching only
  the ANDed items
   in the query ran out, then those that had only two of
  the terms, then
   only 1 of the terms, and then those close to it would
  start showing up.
  [...]
   Plus, if he wants terms that contain ONLY those words,
  and no others, an
   ANDed query would not do that, right? ANDed queries
  return results that
   must have ALL the terms listed, and could have lots of
  other words, right?
 
  This is *exactly* what I just said: ANDed queries (i.e.,
  requiring all query terms) will not satisfy Varun's
  requirements.
 
  Your participation in this thread looks an awful lot like
  flame-bating: Someone else asks a question, I answer with a
  possible solution, you give a one-word overkill response,
  I say why it's not overkill.  You then ask if anybody
  knows the answer to the original question, and then parrot
  my response to your overkill statement.  Really
 
  Get your shit together or shut up.  Please.
 
  Steve
 
   -Original Message-
   From: Dennis Gearon [mailto:gear...@sbcglobal.net]
   Sent: Tuesday, October 26, 2010 3:14 PM
   To: solr-user@lucene.apache.org
   Subject: RE: How do I this in Solr?
  
  
  
   Dennis Gearon
  
   Signature Warning
   
   It is always a good idea to learn from your own
  mistakes. It is usually a
   better idea to learn from others’ mistakes, so you
  do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  
   EARTH has a Right To Life,
     otherwise we all die.
  
  
   --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
  wrote:
  
From: Steven A Rowe sar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,
   
Do you mean to say that you read my earlier post,
  and
disagree that it would solve the problem?  Or
  have you
simply not read it?
   
Steve
   
 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:00 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?

 Good point. Since I might need such a query
  myself
someday, how *IS* that
 done?


 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your
  own
mistakes. It is usually a
 better idea to learn from others’
  mistakes, so you
do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  

 EARTH has a Right To Life,
   otherwise we all die.


 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
wrote:

  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org
solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 11:46
  AM
  Um, maybe I'm way off base, but when
  Varun said:
 
   If I search with the text samsung
  andriod
GPS,
   search results should only conain
  samsung,
GPS,
   andriod and samsung andriod.
 
  I interpreted that to mean that hit
  documents
should
  contain terms from the query, and
  nothing else.
Making
  all terms required doesn't do this.
 
  Steve
 
   -Original Message-
   From: Matthew Hall [mailto:mh...@informatics.jax.org]
   Sent: Tuesday, October 26, 2010
  2:30 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How 

Re: Highlighting for non-stored fields

2010-10-26 Thread Phong Dais
Thanks for the insight.
This is definitely a feasible solution because I only need to highlight when
the user open the document.
I guess the easiest way I can do this is to reuse the solr code (with some
modification) in my own application.

On Tue, Oct 26, 2010 at 2:35 PM, Pradeep Singh pksing...@gmail.com wrote:

 Another way you can do this is - after the search has completed, load the
 field in your application, write separate code to reanalyze that
 field/document, index it in RAM, and run it through highlighter classes.
 All
 this as part of your web application outside of Solr. Considering the size
 of your data it doesn't look advisable to store it because then you would
 be
 almost doubling the size of your index (if you are looking to highlight on
 a
 field then it's probably going to be full of content).

 -Pradeep

 On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais phong.gd...@gmail.com wrote:

  Hi,
 
  I understand that I need to store the fields in order to use highlighting
  out of the box.
  I'm looking for a way to highlighting using term offsets instead of the
  actual text since the text is not stored.  What am asking is is it
 possible
  to modify the response (thru custom implementation) to contain
 highlighted
  offsets instead of the actual matched text.  Should I be writing my own
  DefaultHighlighter?  Or overiding some of its functionality?  Can this be
  done this way or am I way off?
 
  BTW, I'm using solr-1.4.
 
  Thanks,
  P.
 
  On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo israele...@gmail.com
 wrote:
 
   Check out this link
  
   http://wiki.apache.org/solr/FieldOptionsByUseCase
  
   You need to store the field if you want to use the highlighting
 feature.
  
   If you need to retrieve and display the highlighted snippets then the
   fields
   definitely needs to be stored.
  
   To use term offsets, it will be a good idea to enable the following
   attributes for that field  termVectors termPositions termOffsets
  
   The only issue here is that your storage costs will increase because of
   these extra features.
  
   Nevertheless, you definitely need to store the field if you need to
   retrieve
   it for highlighting purposes.
  
   On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com
  wrote:
  
Hi,
   
I've been looking thru the mailing archive for the past week and I
   haven't
found any useful info regarding this issue.
   
My requirement is to index a few terabytes worth of data to be
  searched.
Due to the size of the data, I would like to index without storing
 but
  I
would like to use the highlighting feature.  Is this even possible?
   What
are my options?
   
I've read about termOffsets, payload that could possibly be used to
 do
   this
but I have no idea how this could be done.
   
Any pointers greatly appreciated.  Someone please point me in the
 right
direction.
   
 I don't mind having to write some code or digging thru existing code
  to
accomplish this task.
   
Thanks,
P.
   
  
  
  
   --
   °O°
   Good Enough is not good enough.
   To give anything less than your best is to sacrifice the gift.
   Quality First. Measure Twice. Cut Once.
   http://www.israelekpo.com/
  
 



Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall
Indeed, I'd missed the second part of his requirements, my and solution 
is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates all 
of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:Nokia AND text:Mobile AND text:GPS AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:

I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:


From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word overkill response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your overkill statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text samsung

andriod

GPS,

search results should only conain

samsung,

GPS,

andriod and samsung andriod.

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To: solr-user@lucene.apache.org
Subject: Re: How do I this in

Solr?

Um.. you could change your default

clause to

AND

rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis

Gearon wrote:


Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall
Bah.. nope this would miss documents that only match a subset of the 
given terms.


I'm going to have to go with Steven's approach as the right choice here.

Matt

On 10/26/2010 3:44 PM, Matthew Hall wrote:
Indeed, I'd missed the second part of his requirements, my and 
solution is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates 
all of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:Nokia AND text:Mobile AND text:GPS AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:
I'm the LAST person anyone will ever need to worry about flame 
baiting. You did notice that I retracted what I said and supported 
your point of view?


Sorry if my cryptic comment sounded critical. I was wrong, you were 
right :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is 
usually a better idea to learn from others’ mistakes, so you do not 
have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:


From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word overkill response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your overkill statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text samsung

andriod

GPS,

search results should only conain

samsung,

GPS,

andriod and samsung andriod.

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To: 

Jars required in classpath to run embedded solr server?

2010-10-26 Thread Tharindu Mathew
Hi everyone,

Do we need all lucene jars in the class path for this? Seems that the
solr-solrj and solr-core jars are not enough
(http://wiki.apache.org/solr/Solrj). It is asking for lucene jars in
the classpath. Could I know what jars are required to run this?

Thanks in advance.

-- 
Regards,

Tharindu


Re: Strange search

2010-10-26 Thread ramzesua

Try to do some changes, but it's not help:
In _http://localhost:8983/search/admin/schema.jsp  I have, for example, term
main and frequency 7 for this term. But if I try to find this I don't
get any result. If I use wildcard, I have only 4 docs in response.
But if I try to find term html (frequency  5) I don't get any result
even with wildcard. Where is problem and how I can it solvе?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1774059.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Matt,

I think your concern about performance is spot-on, though.

The combinatorial explosion would be at query time, not at index time - my 
solution has a single token indexed per document. My suggested query-time 
filter would generate the following number of output terms, where C(n,k) is the 
combination of n things taken k at a time, n is the number of input query 
terms, and k is the number of concatenated input query terms forming one output 
query term:

C(n,1)+C(n,2)...+C(n,n-1)+C(n,n)

For small queries this would not be a problem:

1 input query term - 1 output query term
2 input query terms - 3 output query terms
3 input query terms - 7 output query terms
4 input query terms - 15 output query terms

But for larger queries, it could be fairly expensive:

10 input query terms - 1,023 output query terms
...
15 input query terms - 32,767 output query terms

This is exactly (2^n - 1) output query terms, where n is the number of input 
terms.

32k query terms might be too slow to be functional.

Steve

 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 3:51 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?
 
 Bah.. nope this would miss documents that only match a subset of the
 given terms.
 
 I'm going to have to go with Steven's approach as the right choice here.
 
 Matt
 
 On 10/26/2010 3:44 PM, Matthew Hall wrote:
  Indeed, I'd missed the second part of his requirements, my and
  solution is sadly insufficient to this task.
 
  The combinatorial part of you solution worries me a bit though Steven,
  because his documents that are on the larger side of his corpus would
  likely slow down query performance a bit while the filter calculates
  all of the possibilities for a given document.
 
  I'm wondering if a slightly hybrid approach would be valid:
 
  Have a filter that calculates the total number of terms for a given
  document.  And then add a clause into your query at runtime that would
  match what the filter would come up with:
 
  So:
 
  text:Nokia AND text:Mobile AND text:GPS AND termCount: 3
 
  Something like that anyhow.
 
  Matt
 
  On 10/26/2010 3:35 PM, Dennis Gearon wrote:
  I'm the LAST person anyone will ever need to worry about flame
  baiting. You did notice that I retracted what I said and supported
  your point of view?
 
  Sorry if my cryptic comment sounded critical. I was wrong, you were
  right :-)
  Dennis Gearon
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is
  usually a better idea to learn from others’ mistakes, so you do not
  have to make them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  EARTH has a Right To Life,
 otherwise we all die.
 
 
  --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:
 
  From: Steven A Rowesar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:27 PM
  Hi Dennis,
 
  You wrote:
  If Solr is like Google, once documents matching only
  the ANDed items
  in the query ran out, then those that had only two of
  the terms, then
  only 1 of the terms, and then those close to it would
  start showing up.
  [...]
  Plus, if he wants terms that contain ONLY those words,
  and no others, an
  ANDed query would not do that, right? ANDed queries
  return results that
  must have ALL the terms listed, and could have lots of
  other words, right?
 
  This is *exactly* what I just said: ANDed queries (i.e.,
  requiring all query terms) will not satisfy Varun's
  requirements.
 
  Your participation in this thread looks an awful lot like
  flame-bating: Someone else asks a question, I answer with a
  possible solution, you give a one-word overkill response,
  I say why it's not overkill.  You then ask if anybody
  knows the answer to the original question, and then parrot
  my response to your overkill statement.  Really
 
  Get your shit together or shut up.  Please.
 
  Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:14 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
 
 
 
  Dennis Gearon
 
  Signature Warning
  
  It is always a good idea to learn from your own
  mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
  do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
  EARTH has a Right To Life,
  otherwise we all die.
 
 
  --- On Tue, 10/26/10, Steven A Rowesar...@syr.edu
  wrote:
  From: Steven A Rowesar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:10 PM
  Dennis,
 
  Do you mean to say 

Re: ClassCastException Issue

2010-10-26 Thread Ken Stanley
On Mon, Oct 25, 2010 at 2:45 AM, Alex Matviychuk alex...@gmail.com wrote:

 Getting this when deploying to tomcat:

 [INFO][http-4443-exec-3][solr.schema.IndexSchema] readSchema():394
 Reading Solr Schema
 [INFO][http-4443-exec-3][solr.schema.IndexSchema] readSchema():408
 Schema name=tsadmin
 [ERROR][http-4443-exec-3][util.plugin.AbstractPluginLoader] log():139
 java.lang.ClassCastException: org.apache.solr.schema.StrField cannot
 be cast to org.apache.solr.schema.FieldType
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:419)
at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:447)
at
 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
at
 org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:456)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:95)
at org.apache.solr.core.SolrCore.init(SolrCore.java:520)
at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)


 solr schema:

 ?xml version=1.0 encoding=UTF-8 ?
 schema name=tsadmin version=1.2
types
fieldType name=string class=solr.StrField
 sortMissingLast=true omitNorms=true/
...
/types
fields
   field name=type type=string required=true/
   ...
/fields
 /schema


 Any ideas?

 Thanks,
 Alex Matviychuk



Alex,

I've run into this issue myself, and it was because I tried to create a
fieldType called string (like you). Rename string to something else and
the exception should go away.

- Ken


Multiple Word Facets

2010-10-26 Thread Adam Estrada
All,
I am a new to Solr faceting and stuck on how to get multiple-word
facets returned from a standard Solr query. See below for what is
currently being returned.

lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=title
int name=Federal89/int
int name=EFLHD87/int
int name=Eastern87/int
int name=Lands87/int
int name=Highways84/int
int name=FHWA60/int
int name=Transportation32/int
int name=GIS22/int
int name=Planning19/int
int name=Asset15/int
int name=Environment15/int
int name=Management14/int
int name=Realty12/int
int name=Highway11/int
int name=HEP10/int
int name=Program9/int
int name=HEPGIS7/int
int name=Resources7/int
int name=Roads7/int
int name=EEI6/int
int name=Environmental6/int
int name=Right6/int
int name=Way6/int
...etc...

There are many terms in there that are 2 or 3 word phrases. For
example, Eastern Federal Lands Highway Division all gets broken down
in to the individual words that make up the total group of words. I've
seen quite a few websites that do what it is I am trying to do here so
any suggestions at this point would be great. See my schema below
(copied from the example schema).

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

Similar for type=query. Please advise on how to group or cluster
document terms so that they can be used as facets.

Many thanks in advance,
Adam Estrada


Re: snapshot-4.0 and maven

2010-10-26 Thread Tommy Chheng
You use maven-assembly-plugin's jar-with-dependencies to build a single 
jar with all its dependencies


http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven

@tommychheng

On 10/19/10 6:53 AM, Matt Mitchell wrote:

Hey thanks Tommy. To be more specific, I'm trying to use SolrJ in a
clojure project. When I try to use SolrJ using what you showed me, I
get errors saying lucene classes can't be found etc.. Is there a way
to build everything SolrJ (snapshot-4.0) needs into one jar?

Matt

On Mon, Oct 18, 2010 at 11:01 PM, Tommy Chhengtommy.chh...@gmail.com  wrote:

Once you built the solr 4.0 jar, you can use mvn's install command like
this:

mvn install:install-file -DgroupId=org.apache -DartifactId=solr
-Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar
-DgeneratePom=true

@tommychheng

On 10/18/10 7:28 PM, Matt Mitchell wrote:

I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
this possible to do? If so, could someone give me a tip or two on
getting started?

Thanks,
Matt



Re: Multiple Word Facets

2010-10-26 Thread Pradeep Singh
Use this field type -

fieldType name=facetField class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
/fieldType

On Tue, Oct 26, 2010 at 6:43 PM, Adam Estrada estrada.a...@gmail.comwrote:

 All,
 I am a new to Solr faceting and stuck on how to get multiple-word
 facets returned from a standard Solr query. See below for what is
 currently being returned.

 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=title
 int name=Federal89/int
 int name=EFLHD87/int
 int name=Eastern87/int
 int name=Lands87/int
 int name=Highways84/int
 int name=FHWA60/int
 int name=Transportation32/int
 int name=GIS22/int
 int name=Planning19/int
 int name=Asset15/int
 int name=Environment15/int
 int name=Management14/int
 int name=Realty12/int
 int name=Highway11/int
 int name=HEP10/int
 int name=Program9/int
 int name=HEPGIS7/int
 int name=Resources7/int
 int name=Roads7/int
 int name=EEI6/int
 int name=Environmental6/int
 int name=Right6/int
 int name=Way6/int
 ...etc...

 There are many terms in there that are 2 or 3 word phrases. For
 example, Eastern Federal Lands Highway Division all gets broken down
 in to the individual words that make up the total group of words. I've
 seen quite a few websites that do what it is I am trying to do here so
 any suggestions at this point would be great. See my schema below
 (copied from the example schema).

fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer

 Similar for type=query. Please advise on how to group or cluster
 document terms so that they can be used as facets.

 Many thanks in advance,
 Adam Estrada



Re: Multiple Word Facets

2010-10-26 Thread Ahmet Arslan
Facets are generated from indexed terms.

Depending on your need/use-case: 

You can use a additional separate String field (which is not tokenized) for 
facets, populate it via copyField. Search on tokenized field facet on 
non-tokenized field.

Or

You can add solr.ShingleFilterFactory to your index analyzer to form multiple 
word terms.

--- On Wed, 10/27/10, Adam Estrada estrada.a...@gmail.com wrote:

 From: Adam Estrada estrada.a...@gmail.com
 Subject: Multiple Word Facets
 To: solr-user@lucene.apache.org
 Date: Wednesday, October 27, 2010, 4:43 AM
 All,
 I am a new to Solr faceting and stuck on how to get
 multiple-word
 facets returned from a standard Solr query. See below for
 what is
 currently being returned.
 
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=title
 int name=Federal89/int
 int name=EFLHD87/int
 int name=Eastern87/int
 int name=Lands87/int
 int name=Highways84/int
 int name=FHWA60/int
 int name=Transportation32/int
 int name=GIS22/int
 int name=Planning19/int
 int name=Asset15/int
 int name=Environment15/int
 int name=Management14/int
 int name=Realty12/int
 int name=Highway11/int
 int name=HEP10/int
 int name=Program9/int
 int name=HEPGIS7/int
 int name=Resources7/int
 int name=Roads7/int
 int name=EEI6/int
 int name=Environmental6/int
 int name=Right6/int
 int name=Way6/int
 ...etc...
 
 There are many terms in there that are 2 or 3 word phrases.
 For
 example, Eastern Federal Lands Highway Division all gets
 broken down
 in to the individual words that make up the total group of
 words. I've
 seen quite a few websites that do what it is I am trying to
 do here so
 any suggestions at this point would be great. See my schema
 below
 (copied from the example schema).
 
     fieldType name=text
 class=solr.TextField positionIncrementGap=100
       analyzer type=index
          tokenizer
 class=solr.WhitespaceTokenizerFactory/
     filter
 class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
         filter
 class=solr.StopFilterFactory
                
 ignoreCase=true
                
 words=stopwords.txt
                
 enablePositionIncrements=true
                
 /
     filter
 class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=0
 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/
         filter
 class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
 
 Similar for type=query. Please advise on how to group or
 cluster
 document terms so that they can be used as facets.
 
 Many thanks in advance,
 Adam Estrada
 





Re: How do I this in Solr?

2010-10-26 Thread 朱炎詹
I think you have to write a yet exact match handler yourself (I mean yet 
cause it's not quite exact match we normally know). Steve's answer is quite 
near your request. You can do further work based on his solution.


At the last step, I'll suggest you eat up all blank within query string and 
query result, respevtively  only returns those results that has equal 
string length as the query string's.


For example, giving:
*query string = Samsung with GPS
*query results:
resutl 1 = Samsung has lots of mobile with GPS
result 2 = with GPS Samsng
result 3 = GPS mobile with vendors, such as Sony, Samsung

they become:
*query result = SamsungwithGPS (length =14)
*query results:
resutl 1 = SamsunghaslotsofmobilewithGPS (length =29)
result 2 = withGPSSamsng (length =14)
result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43)

so result 2 matches your request.

In this way, you can avoid case-sensitive, word-order-rearrange load of 
works. Furthermore, you can do refined work, such as remove white 
characters, etc.


Scott @ Taiwan


- Original Message - 
From: Varun Gupta varun.vgu...@gmail.com

To: solr-user@lucene.apache.org
Sent: Tuesday, October 26, 2010 9:07 PM
Subject: How do I this in Solr?



Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only 
those
documents that satisfy this criteria All of the words of the search 
result

document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS, android,
samsung, samsung andriod, nokia andriod, mobile with GPS

If I search with the text samsung andriod GPS, search results should 
only

conain samsung, GPS, andriod and samsung andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta








%b6G$J0T.'$$'d(l/f,r!C
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 
14:34:00




Re: FieldCollapsing and Stats or Sum ?!

2010-10-26 Thread Lance Norskog
Do you want one number, or the sum for each group? For one number, the
stats component is fine.

For one number per group, grouping does not (yet) support the stats
component. This is the old SQL Group By command, right?

On Tue, Oct 26, 2010 at 6:42 AM, stockiii stock.jo...@gmail.com wrote:

 Hello.

 we want to group with field collapsing and we want a sum of this groups.

 in example:
 group by currency_id: EUR, CHF, ...
 and for this groups, the correct sum of the documents from the field: amount

 ist this in one Request possible ? or its necessary do this in several
 requests ?
 maybe first grouping and then using the statsComponent to get the sum of the
 group by sending a new request with the filter ? but then i dont need
 grouping !?!?

 thx =)
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/FieldCollapsing-and-Stats-or-Sum-tp1773842p1773842.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


how to index raw data

2010-10-26 Thread jayant

Hi, I wanted to use a few fields from the dataase, but cannot use the DIH
because jdbc access to the database is not allowed. We can only go thru a
wrapper. As such, I would like to know how I can index the data obtained
through the db wrapper, using solrJ. I would have two fields to index - id
and a text field containing the data.
Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-index-raw-data-tp1778033p1778033.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do I this in Solr?

2010-10-26 Thread Varun Gupta
Thanks everybody for the inputs.

Looks like Steven's solution is the closest one but will lead to performance
issues when the query string has many terms.

I will try to implement the two filters suggested by Steven and see how the
performance matches up.

--
Thanks
Varun Gupta


On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) scott@udngroup.comwrote:

 I think you have to write a yet exact match handler yourself (I mean yet
 cause it's not quite exact match we normally know). Steve's answer is quite
 near your request. You can do further work based on his solution.

 At the last step, I'll suggest you eat up all blank within query string and
 query result, respevtively  only returns those results that has equal
 string length as the query string's.

 For example, giving:
 *query string = Samsung with GPS
 *query results:
 resutl 1 = Samsung has lots of mobile with GPS
 result 2 = with GPS Samsng
 result 3 = GPS mobile with vendors, such as Sony, Samsung

 they become:
 *query result = SamsungwithGPS (length =14)
 *query results:
 resutl 1 = SamsunghaslotsofmobilewithGPS (length =29)
 result 2 = withGPSSamsng (length =14)
 result 3 = GPSmobilewithvendors,suchasSony,Samsung (length =43)

 so result 2 matches your request.

 In this way, you can avoid case-sensitive, word-order-rearrange load of
 works. Furthermore, you can do refined work, such as remove white
 characters, etc.

 Scott @ Taiwan


 - Original Message - From: Varun Gupta varun.vgu...@gmail.com

 To: solr-user@lucene.apache.org
 Sent: Tuesday, October 26, 2010 9:07 PM

 Subject: How do I this in Solr?


  Hi,

 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only
 those
 documents that satisfy this criteria All of the words of the search
 result
 document are present in the search query

 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS

 If I search with the text samsung andriod GPS, search results should
 only
 conain samsung, GPS, andriod and samsung andriod.

 Is there a way to do this in Solr.

 --
 Thanks
 Varun Gupta




 



 %b6G$J0T.'$$'d(l/f,r!C
 Checked by AVG - www.avg.com
 Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
 14:34:00




Re: Solr sorting problem

2010-10-26 Thread Ron Mayer
Erick Erickson wrote:
 In general, the behavior when sorting is not predictable when
 sorting on a tokenized field, which text is. What would
 it mean to sort on a field with erick Moazzam as tokens
 in a single document? Should it be in the es or the ms?

Might it be possible or reasonable to have it show up under
both e and m?  Or if not, just at the first one it finds?

I've recently been asked a similar question where we wanted
to sort documents by a victim's age.  I have a victim_age
field, but since there can be multiple victims in an incident
it wasn't a unique field.   As a workaround, I added a
victim_age_min field; but it would have been easier if
I didn't need to do that.

 That said, you probably want to watch out for case
 
 Best
 Erick
 
 On Fri, Oct 22, 2010 at 10:02 AM, Moazzam Khan moazz...@gmail.com wrote:
 
 For anyone who faced the same problem, changing the field to string
 from text worked!

 -Moazzam

 On Fri, Oct 22, 2010 at 8:50 AM, Moazzam Khan moazz...@gmail.com wrote:
 The field type of the first name and last name is text. Could that be
 why it's not sorting properly? I just changed it to string and started
 a full-import. Hopefully that will work.

 Thanks,
 Moazzam

 On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil
 jayendra.patil@gmail.com wrote:
 need additional information .
 Sorting is easy in Solr just by passing the sort parameter

 However, when it comes to text sorting it depends on how you analyse
 and tokenize your fields
 Sorting does not work on fields with multiple tokens.

 http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F
 On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan moazz...@gmail.com
 wrote:
 Hey guys,

 I have a list of people indexed in Solr. I am trying to sort by their
 first names but I keep getting results that are not alphabetically
 sorted (I see the names starting with W before the names starting with
 A). I have a feeling that the results are first being sorted by
 relevancy then sorted by first name.

 Is there a way I can get the results to be sorted alphabetically?

 Thanks,
 Moazzam

 



Re: how well does multicore scale?

2010-10-26 Thread Tharindu Mathew
Really great to know you were able to fire up about 100 cores. But,
when it scales up to around 1000 or even more. I wonder how it would
perform.

I have a question regarding ids i.e. the unique key. Since there is a
potential use case that two users might add the same document, how
would we set the id. I was thinking of appending the user id to the an
id I would use ex: /system/bar.pdfuserid25. Otherwise, solr would
replace the document of one user, which is not what we want.

This is also applicable to deleteById. Is there a better way to do this?

On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 mike anderson wrote:

 I'm really curious if there is a clever solution to the obvious problem
 with: So your better off using a single index and with a user id and use
 a query filter with the user id when fetching data., i.e.. when you have
 hundreds of thousands of user IDs tagged on each article. That just
 doesn't
 sound like it scales very well..


 Actually, I think that design would scale pretty fine, I don't think there's
 an 'obvious' problem. You store your userIDs in a multi-valued field (or as
 multiple terms in a single value, ends up being similar). You fq on there
 with the current userID.   There's one way to find out of course, but that
 doesn't seem a patently ridiculous scenario or anything, that's the kind of
 thing Solr is generally good at, it's what it's built for.   The problem
 might actually be in the time it takes to add such a document to the index;
 but not in query time.

 Doesn't mean it's the best solution for your problem though, I can't say.

 My impression is that Solr in general isn't really designed to support the
 kind of multi-tenancy use case people are talking about lately.  So trying
 to make it work anyway... if multi-cores work for you, then great, but be
 aware they weren't really designed for that (having thousands of cores) and
 may not. If a single index can work for you instead, great, but as you've
 discovered it's not neccesarily obvious how to set up the schema to do what
 you need -- really this applies to Solr in general, unlike an rdbms where
 you just third-form-normalize everything and figure it'll work for almost
 any use case that comes up,  in Solr you generally need to custom fit the
 schema for your particular use cases, sometimes being kind of clever to
 figure out the optimal way to do that.

 This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
 index takes more intellectual work than setting up an rdbms. The trade off
 is you get speed, and flexible ways to set up relevancy (that still perform
 well). Took a couple decades for rdbms to get as brainless to use as they
 are, maybe in a couple more we'll have figured out ways to make indexing
 engines like solr equally brainless, but not yet -- but it's still pretty
 damn easy for what it is, the lucene/Solr folks have done a remarkable job.




-- 
Regards,

Tharindu


Re: Looking for Developers

2010-10-26 Thread Igor Chudov
UNSUBSCRIBE

On Wed, Oct 27, 2010 at 12:14 AM, ST ST stst2...@gmail.com wrote:
 Looking for Developers Experienced in Solr/Lucene And/OR FAST Search Engines
 from India (Pune)

 We are looking for off-shore India Based Developers who are proficient in
 Solr/Lucene and/or FAST search engine .
 Developers in the cities of Pune/Bombay in India are preferred. Development
 is for projects based in US for a reputed firm.

 If you are proficient in Solr/Lucene/FAST and have 5 years minimum industry
 experience with atleast 3 years in Search Development,
 please send me your resume.

 Thanks



Re: Looking for Developers

2010-10-26 Thread Yuchen Wang
UNSUBSCRIBE

On Tue, Oct 26, 2010 at 10:15 PM, Igor Chudov ichu...@gmail.com wrote:

 UNSUBSCRIBE

 On Wed, Oct 27, 2010 at 12:14 AM, ST ST stst2...@gmail.com wrote:
  Looking for Developers Experienced in Solr/Lucene And/OR FAST Search
 Engines
  from India (Pune)
 
  We are looking for off-shore India Based Developers who are proficient in
  Solr/Lucene and/or FAST search engine .
  Developers in the cities of Pune/Bombay in India are preferred.
 Development
  is for projects based in US for a reputed firm.
 
  If you are proficient in Solr/Lucene/FAST and have 5 years minimum
 industry
  experience with atleast 3 years in Search Development,
  please send me your resume.
 
  Thanks