date:20101026

Hi Everybody,

If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
can apply the changes to schema.xml without restarting Solr?

Swapnonil Mukherjee

Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread David Stuart

If you are using Solr Multicore http://wiki.apache.org/solr/CoreAdmin you can 
issue a Reload command 
http://localhost:8983/solr/admin/cores?action=RELOADcore=core0

On 26 Oct 2010, at 11:09, Swapnonil Mukherjee wrote:

 Hi Everybody,
 
 If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
 can apply the changes to schema.xml without restarting Solr?
 
 Swapnonil Mukherjee

Re: command line to check if Solr is up running

2010-10-26 Thread Peter Karich

Hi Xin,

from the wiki:
http://wiki.apache.org/solr/SolrConfigXml

The URL of the ping query is* /admin/ping

* You can also check (via wget) the number of documents. it might look
like a rusty hack but it works for me:

wget -T 1 -q http://localhost:8080/solr/select?q=*:*; -O - | tr '/'
'\n' | grep numFound | tr '' ' ' | awk '{print $5}'`

Regards,
Peter.

As we know we can use browser to check if Solr is running by going to
http://$hostName:$portNumber/$masterName/admin, say http://localhost:8080/solr1/admin. My questions
is: are there any ways to check it using command line? I used curl
http://localhost:8080; to check my Tomcat, it worked fine. However, no response if I try
curl http://localhost:8080/solr1/admin; (even when my Solr is running). Does anyone know
any command line alternatives?

Thanks,
Xin
This electronic mail message contains information that (a) is or
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE
PROTECTED
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of
the addressee(s) named herein. If you are not an intended
recipient, please contact the sender immediately and take the
steps necessary to delete the message completely from your
computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the
Uniform Electronic Transaction Act or any other law of similar
effect, absent an express statement to the contrary, this e-mail
message, its contents, and any attachments hereto are not
intended
to represent an offer or acceptance to enter into a contract and
are not otherwise intended to bind this sender,
barnesandnoble.com
llc, barnesandnoble.com inc. or any other person or entity.

--
http://jetwick.com twitter search prototype

Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira

I am using mysql database, and, field type is date

On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The
  query result is correct. But when i see it in my index, the value stored
 is
  something unusual bunch of characters e.g. *...@6628ad5a*
 [...]

 Which database are you indexing from? The field type is probably
 a blob in the database. Check that, and look into the ClobTransformer:
 http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

 Regards,
 Gora




-- 
Thanks,
Pawan Darira

Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Peter Karich


 Hi,

See this:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

Solr will also load the new configuration (without restart the webapp) 
on the slaves when using replication:

http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.


Hi Everybody,

If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
can apply the changes to schema.xml without restarting Solr?

Swapnonil Mukherjee







--
http://jetwick.com twitter search prototype

RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir

This is probably just a date format problem, nothing to do with the IF()
statement.  Try applying this on your date:
DATE_FORMAT(yourDate, '%Y-%m-%dT00:00:00Z')

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

I am using mysql database, and, field type is date

On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty g...@mimirtech.com
wrote:

 On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement.
The
  query result is correct. But when i see it in my index, the value
stored
 is
  something unusual bunch of characters e.g. *...@6628ad5a*
 [...]

 Which database are you indexing from? The field type is probably
 a blob in the database. Check that, and look into the ClobTransformer:
 http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

 Regards,
 Gora

-- 
Thanks,
Pawan Darira

Highlighting for non-stored fields

2010-10-26 Thread Phong Dais

Hi,

I've been looking thru the mailing archive for the past week and I haven't
found any useful info regarding this issue.

My requirement is to index a few terabytes worth of data to be searched.
Due to the size of the data, I would like to index without storing but I
would like to use the highlighting feature.  Is this even possible?  What
are my options?

I've read about termOffsets, payload that could possibly be used to do this
but I have no idea how this could be done.

Any pointers greatly appreciated.  Someone please point me in the right
direction.

 I don't mind having to write some code or digging thru existing code to
accomplish this task.

Thanks,
P.

RE: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Ephraim Ofir

Note that usually when you change the schema.xml you have not only to
restart solr, but also rebuild the index, so the issue of how to reload
the file seems like a small problem...

Ephraim Ofir

-Original Message-
From: Peter Karich [mailto:peat...@yahoo.de] 
Sent: Tuesday, October 26, 2010 12:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Does Solr reload schema.xml dynamically?

  Hi,

See this:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

Solr will also load the new configuration (without restart the webapp) 
on the slaves when using replication:
http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.

 Hi Everybody,

 If I change my schema.xml to, do I have to restart Solr. Is there some
way, I can apply the changes to schema.xml without restarting Solr?

 Swapnonil Mukherjee

-- 
http://jetwick.com twitter search prototype

Re: How to index on basis of a condition?

2010-10-26 Thread Gora Mohanty

On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira pawan.dar...@gmail.com wrote:
 I am using mysql database, and, field type is date
[...]

Could you show us the exact SELECT statement, and some example
values returned by running the SELECT directly at a mysql console?

Regards,
Gora

Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira

My Sql is

select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from
tcuser.ad_details where SOME_CONDiTION

+---+
| ad_sort_field |
+---+
| 2010-05-30|
| 2010-05-02|
| 2010-10-07|
| NULL|
| 2010-10-15|
| NULL|
++

Thanks
Pawan


On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  I am using mysql database, and, field type is date
 [...]

 Could you show us the exact SELECT statement, and some example
 values returned by running the SELECT directly at a mysql console?

 Regards,
 Gora




-- 
Thanks,
Pawan Darira

Query only a specfic field with a specific value using Dismax Handler

Hi Everybody,

Let me give you a brief idea of our Solr document. We have about 6 text type 
fields, each containing IPTC data extracted from photos. Search is performed 
mostly on these 6 fields.
We also have a mutlivalue field named group_id that contains a list of all the  
group_ids that have access to this photo.  In other words we are storing the 
metadata of the photo as well as the permissions applicable for this photo in 
the Solr document itself. This group_id field by the way is of long type.

Additionally we have certain boolean and constant type fields named 
visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).

The first field defaultSearch is a copyField which contains a copy of all the 
values of 6 text type fields that I have mentioned.

The way we query presently using the default search handler is like this.

defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
(visibleToEndUser:true)

We want to start using the dismax (if not dismax then edismax)  query handler 
but so far I have not been able to replicate the query mentioned above to the 
equivalent dismax form.

What I cannot figure out is?

1. How do I apply exact match on the group_id, visibleToEndUser and the 
entityType fields? Or How how do I query a specific field with a specific value 
rather than searching across all fields with all values.
2. How do I apply OR and AND conditions?


Swapnonil Mukherjee

Re: Does Solr reload schema.xml dynamically?

Hi Everybody,

Thanks Ephraim and Peter. I think I got my answer.

Swapnonil Mukherjee




On 26-Oct-2010, at 4:23 PM, Ephraim Ofir wrote:

 Note that usually when you change the schema.xml you have not only to
 restart solr, but also rebuild the index, so the issue of how to reload
 the file seems like a small problem...
 
 Ephraim Ofir
 
 -Original Message-
 From: Peter Karich [mailto:peat...@yahoo.de] 
 Sent: Tuesday, October 26, 2010 12:29 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Does Solr reload schema.xml dynamically?
 
  Hi,
 
 See this:
 http://wiki.apache.org/solr/CoreAdmin#RELOAD
 
 Solr will also load the new configuration (without restart the webapp) 
 on the slaves when using replication:
 http://wiki.apache.org/solr/SolrReplication
 
 Regards,
 Peter.
 
 Hi Everybody,
 
 If I change my schema.xml to, do I have to restart Solr. Is there some
 way, I can apply the changes to schema.xml without restarting Solr?
 
 Swapnonil Mukherjee
 
 
 
 
 
 
 -- 
 http://jetwick.com twitter search prototype

RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir

Try:
select IF(sub_cat_id=2002, DATE_FORMAT(ad_post_date,
'%Y-%m-%dT00:00:00Z/DAY'), null) as 'ad_sort_field' from
tcuser.ad_details where SOME_CONDiTION

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 1:29 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

My Sql is

select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from
tcuser.ad_details where SOME_CONDiTION

+---+
| ad_sort_field |
+---+
| 2010-05-30|
| 2010-05-02|
| 2010-10-07|
| NULL|
| 2010-10-15|
| NULL|
++

Thanks
Pawan


On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty g...@mimirtech.com
wrote:

 On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:
  I am using mysql database, and, field type is date
 [...]

 Could you show us the exact SELECT statement, and some example
 values returned by running the SELECT directly at a mysql console?

 Regards,
 Gora




-- 
Thanks,
Pawan Darira

Next Word - Any Suggestions?

2010-10-26 Thread Christopher Ball

Am about to implement a custom query that is sort of mash-up of Facets,
Highlighting, and SpanQuery - but thought I'd see if anyone has done
anything similar. 

 

In simple words, I need facet on the next word given a target word.

 

For example, if my index only had the following 5 documents (comprised of a
sentence each):

 

Doc 1 - The quick brown fox jumped over the fence.

Doc 2 - The sly fox skipped over the fence.

Doc 3 - The fat fox skipped his afternoon class.

Doc 4 - A brown duck and red fox, crashed the party.

Doc 5 - Charles Brown! Fox! Crashed my damn car.

 

The query should give the frequency of the distinct terms after the word
fox:

 

skipped - 2

crashed - 2 

jumped - 1

 

Long-term, do the opposite - frequency of the distinct terms before the word
fox:

 

brown - 2

sly - 1

fat - 1 

red - 1

 

My guess is that either the FastVectorHighlighter or SpanQuery would be a
reasonable starting point. I was hoping to take advantage of Vectors as I am
storing termVectors, termPositions, and termOffsets for the field in
question.

 

Grateful for any thoughts . . . reference implementations . . . words of
encouragement . . . free beer - whatever you can offer.

 

Gracias,

 

Christopher

How do I this in Solr?

2010-10-26 Thread Varun Gupta

Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only those
documents that satisfy this criteria All of the words of the search result
document are present in the search query

For example:
If I have the following documents indexed: nokia n95, GPS, android,
samsung, samsung andriod, nokia andriod, mobile with GPS

If I search with the text samsung andriod GPS, search results should only
conain samsung, GPS, andriod and samsung andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta

Re: How do I this in Solr?

2010-10-26 Thread Savvas-Andreas Moysidis

If I get your question right, you probably want to use the AND binary
operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS

On 26 October 2010 14:07, Varun Gupta varun.vgu...@gmail.com wrote:

 Hi,

 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only those
 documents that satisfy this criteria All of the words of the search result
 document are present in the search query

 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS

 If I search with the text samsung andriod GPS, search results should only
 conain samsung, GPS, andriod and samsung andriod.

 Is there a way to do this in Solr.

 --
 Thanks
 Varun Gupta

RE: How do I this in Solr?

Hi Varun,

I can't think of a way to do it without writing new analysis filters.

But I think you could do what you want with two filters (this is untested):

1. An index-time filter that outputs a single token consisting of all of the 
input tokens, sorted in a consistent way, e.g.:

   mobile with GPS - GPS mobile with
   samsung android - android samsung

2. A query-time filter that outputs one token per input term combination, 
sorted in the same consistent way as the index-time filter, e.g.:

   samsung andriod GPS
 - samsung,android,GPS,
android samsung,GPS samsung,android GPS
android GPS samsung

Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in Solr?
 
 Hi,
 
 I have lot of small documents (each containing 1 to 15 words) indexed in
 Solr. For the search query, I want the search results to contain only
 those
 documents that satisfy this criteria All of the words of the search
 result
 document are present in the search query
 
 For example:
 If I have the following documents indexed: nokia n95, GPS, android,
 samsung, samsung andriod, nokia andriod, mobile with GPS
 
 If I search with the text samsung andriod GPS, search results should
 only
 conain samsung, GPS, andriod and samsung andriod.
 
 Is there a way to do this in Solr.
 
 --
 Thanks
 Varun Gupta

Re: Highlighting for non-stored fields

2010-10-26 Thread Israel Ekpo

Check out this link

http://wiki.apache.org/solr/FieldOptionsByUseCase

You need to store the field if you want to use the highlighting feature.

If you need to retrieve and display the highlighted snippets then the fields
definitely needs to be stored.

To use term offsets, it will be a good idea to enable the following
attributes for that field  termVectors termPositions termOffsets

The only issue here is that your storage costs will increase because of
these extra features.

Nevertheless, you definitely need to store the field if you need to retrieve
it for highlighting purposes.

On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com wrote:

 Hi,

 I've been looking thru the mailing archive for the past week and I haven't
 found any useful info regarding this issue.

 My requirement is to index a few terabytes worth of data to be searched.
 Due to the size of the data, I would like to index without storing but I
 would like to use the highlighting feature.  Is this even possible?  What
 are my options?

 I've read about termOffsets, payload that could possibly be used to do this
 but I have no idea how this could be done.

 Any pointers greatly appreciated.  Someone please point me in the right
 direction.

  I don't mind having to write some code or digging thru existing code to
 accomplish this task.

 Thanks,
 P.




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Documents are deleted when Solr is restarted

2010-10-26 Thread Mackram Raydan


Hey everyone,

I apologize if this question is rudimentary but it is getting to me and 
I did not find anything reasonable about it online.


So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the 
SolrTomcat wiki page to setup. The system works exactly the way I want 
it (proper search, highlighting, etc...). The problem however is when I 
restart my Tomcat server all the data in Solr (ie the index) is simply 
lost. The admin shows me the number of docs is 0 when it was before in 
the thousands.


Can someone please help me understand why the above is happening and how 
can I workaround it if possible?


Big thanks for any help you can send my way.

Regards,

Mackram

Re: a bug of solr distributed search

2010-10-26 Thread Ron Mayer

Andrzej Bialecki wrote:
 On 2010-10-25 11:22, Toke Eskildsen wrote:
 On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: 
 But itshows a problem of distrubted search without common idf.
 A doc will get different score in different shard.
 Bingo.

 I really don't understand why this fundamental problem with sharding
 isn't mentioned more often. Every time the advice use sharding is
 given, it should be followed with a but be aware that it will make
 relevance ranking unreliable.
 
 The reason is twofold, I think:


And a third potential reason - it's arguably a feature instead of a bug
for some applications.  Depending on how I organize my shards, give me
the most relevant document from each shard for this search seems like
it could be useful.

 * there is an exact solution to this problem, namely to make two
 distributed calls instead of one (first call to collect per-shard IDFs
 for given query terms, second call to submit a query rewritten with the
 global IDF-s). This solution is implemented in SOLR-1632, with some
 caching to reduce the cost for common queries. However, this means that
 now for every query you need to make two calls instead of one, which
 potentially doubles the time to return results (for simple common
 queries - for rare complex queries the time will be still dominated by
 the query runtime on shard servers).
 
 * another reason is that in many many cases the difference between using
 exact global IDF and per-shard IDFs is not that significant. If shards
 are more or less homogenous (e.g. you assign documents to shards by
 hash(docId)) then term distributions will be also similar. So then the
 question is whether you can accept an N% variance in scores across
 shards, or whether you want to bear the cost of an additional
 distributed RPC for every query...
 
 To summarize, I would qualify your statement with: ...if the
 composition of your shards is drastically different. Otherwise the cost
 of using global IDF is not worth it, IMHO.

Solr - xmlhttprequest

2010-10-26 Thread Yavuz Selim YILMAZ

I have a solr instance in my server, and I can make request with internet
explorer. However, with other browsers I can't.

Error given;
*XMLHttpRequest cannot load http://. Origin http://... is not allowed by
Access-Control-Allow-Origin.*

I changed my apache server conf file and added this lines;

Header set Access-Control-Allow-Origin *
Header set Access-Control-Allow-Methods POST,GET,OPTIONS
Header set Access-Control-Allow-Headers X-PINGOTHER
Header set Access-Control-Max-Age 1728000

to allow.

Still, the same error.

Any suggestion?
--

Yavuz Selim YILMAZ

Re: Solr ExtractingRequestHandler with Compressed files

2010-10-26 Thread Joey Hanzel

Hi Javendra,

Thanks for the suggestion, I updated to Solr 1.4.1 and Solr Cell 1.4.1 and
tried sending a zip file that contained several html documents.
Unfortunately, that did not solve the problem.

Here's the curl command I used:
curl
http://localhost:8983/solr/update/extract?literla.id=d...@uprefix=attr_fmap.content=attri_contentcommit=true;
-F file=data.zip

When I query for id:doc1, the attr_content lists each filename within the
zip archive. It also indexed the stream_size, stream_source and
content_type. It does not appear to be opening up the individual files
within the zip.

Did you have to make any other configuration changes to your solrconfig.xml
or schema.xml to read the contents of the individual files? Would it help
to pass the specific mime type on the curl line ?

On Mon, Oct 25, 2010 at 3:27 PM, Jayendra Patil
jayendra.patil@gmail.com wrote:

There was this issue with the previous version of Solr, wherein only the
file names from the zip used to get indexed.
We had faced the same issue and ended up using the Solr trunk which has the
Tika version upgraded and works fine.

The Solr version 1.4.1 should also have the fix included. Try using it.

Regards,
Jayendra

On Fri, Oct 22, 2010 at 6:02 PM, Joey Hanzel phan...@nearinfinity.com
wrote:

Hi,

Has anyone had success using ExtractingRequestHandler and Tika with any
of
the compressed file formats (zip, tar, gz, etc) ?

I am sending solr the archived.tar file using curl. curl

http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=body_textscommit=true

-H 'Content-type:application/octet-stream' --data-binary
@/home/archived.tar
The result I get when I query the document is that the filenames inside
the
archive are indexed as the body_texts, but the content of those files
is
not extracted or included. This is not the behvior I expected. Ref:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example
.
When I send 1 of the actual documents inside the archive using the same
curl
command the extracted content is then stored in the body_texts field.
Am
I missing a step for the compressed files?

I have added all the extraction depednenices as indicated by mat in
http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-celland
am able to succesfully extract data from MS Word, PDF, HTML documents.

I'm using the following library versions.
Solr 1.40, Solr Cell 1.4.1, with Tika Core 0.4

Given everything I have read this version of Tika should support
extracting
data from all files within a compressed file. Any help or suggestions
would
be appreciated.

Re: Query only a specfic field with a specific value using Dismax Handler

2010-10-26 Thread Jonathan Rochkind

So, first of all, exact match is hard in Solr on tokenized fields.  
Tokenized fields don't really do that.  So for exact match, you should 
probably use a non-tokenized field (string or text with keywordtokenizer 
(which should really be called the non-tokenizer)). If there's only one 
token in your value anyway though, like a single number, it may not 
matter and work fine.


Secondly, I'd recommend combining a dismax query for the user-entered 
phrase (like 'dog') with standard lucene queries for those other 
things.  There are (at least) two ways to do that. The first is just put 
everything after the first AND in one or more 'fq' parameters instead of 
trying to include them in 'q'.  The second is to use Solr's nested query 
syntax, to specify sub-queries with different query parsers. Someone can 
explain the second if you need it, but the easier to understand 'fq' 
approach seems right to me for your case.


Swapnonil Mukherjee wrote:

Hi Everybody,

Let me give you a brief idea of our Solr document. We have about 6 text type 
fields, each containing IPTC data extracted from photos. Search is performed 
mostly on these 6 fields.
We also have a mutlivalue field named group_id that contains a list of all the  
group_ids that have access to this photo.  In other words we are storing the 
metadata of the photo as well as the permissions applicable for this photo in 
the Solr document itself. This group_id field by the way is of long type.

Additionally we have certain boolean and constant type fields named 
visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).

The first field defaultSearch is a copyField which contains a copy of all the 
values of 6 text type fields that I have mentioned.

The way we query presently using the default search handler is like this.

defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
(visibleToEndUser:true)

We want to start using the dismax (if not dismax then edismax)  query handler 
but so far I have not been able to replicate the query mentioned above to the 
equivalent dismax form.

What I cannot figure out is?

1. How do I apply exact match on the group_id, visibleToEndUser and the 
entityType fields? Or How how do I query a specific field with a specific value 
rather than searching across all fields with all values.
2. How do I apply OR and AND conditions?


Swapnonil Mukherjee

Re: how well does multicore scale?

2010-10-26 Thread mike anderson

So I fired up about 100 cores and used JMeter to fire off a few thousand
queries. It looks like the memory usage isn't much worse than running a
single shard. So thats good.

I'm really curious if there is a clever solution to the obvious problem
with: So your better off using a single index and with a user id and use
a query filter with the user id when fetching data., i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..


Cheers,
Mike


On Fri, Oct 22, 2010 at 10:43 PM, Lance Norskog goks...@gmail.com wrote:

 http://wiki.apache.org/solr/CoreAdmin

 Since Solr 1.3

 On Fri, Oct 22, 2010 at 1:40 PM, mike anderson saidthero...@gmail.com
 wrote:
  Thanks for the advice, everyone. I'll take a look at the API mentioned
 and
  do some benchmarking over the weekend.
 
  -Mike
 
 
  On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
  On 10/22/10 1:44 AM, Tharindu Mathew wrote:
   Hi Mike,
  
   I've also considered using a separate cores in a multi tenant
   application, ie a separate core for each tenant/domain. But the cores
   do not suit that purpose.
  
   If you check out documentation no real API support exists for this so
   it can be done dynamically through SolrJ. And all use cases I found,
   only had users configuring it statically and then using it. That was
   maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
 
  You can dynamically manage cores with solrj. See
  org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
  for a place to start.
 
  You probably want to turn solr.xml's persist option on so that your
  cores survive restarts.
 
  
   So your better off using a single index and with a user id and use a
   query filter with the user id when fetching data.
 
  Many times this is probably the case - pro's and con's to each depending
  on what you are up to.
 
  - Mark
  lucidimagination.com
 
  
   On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind rochk...@jhu.edu
  wrote:
   No, it does not seem reasonable.  Why do you think you need a
 seperate
  core
   for every user?
   mike anderson wrote:
  
   I'm exploring the possibility of using cores as a solution to
 bookmark
   folders in my solr application. This would mean I'll need tens of
   thousands
   of cores... does this seem reasonable? I have plenty of CPUs
 available
  for
   scaling, but I wonder about the memory overhead of adding cores
 (aside
   from
   needing to fit the new index in memory).
  
   Thoughts?
  
   -mike
  
  
  
  
  
  
 
 
 



 --
 Lance Norskog
 goks...@gmail.com

Re: How do I this in Solr?

2010-10-26 Thread Ken Stanley

On Tue, Oct 26, 2010 at 9:15 AM, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

 If I get your question right, you probably want to use the AND binary
 operator as in samsung AND andriod AND GPS or +samsung +andriod +GPS


N.b. For these queries you can also pass the q.op parameter in the request
to temporarily change the default operator to AND; this has the same effect
without having to build the query; i.e., you can just pass
http://host:port/solr/select?q=samsung+android+gpsq.op=and;
as the query string (along with any other params you need).

Re: how well does multicore scale?

2010-10-26 Thread Jonathan Rochkind


mike anderson wrote:

I'm really curious if there is a clever solution to the obvious problem
with: So your better off using a single index and with a user id and use
a query filter with the user id when fetching data., i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..
  
Actually, I think that design would scale pretty fine, I don't think 
there's an 'obvious' problem. You store your userIDs in a multi-valued 
field (or as multiple terms in a single value, ends up being similar). 
You fq on there with the current userID.   There's one way to find out 
of course, but that doesn't seem a patently ridiculous scenario or 
anything, that's the kind of thing Solr is generally good at, it's what 
it's built for.   The problem might actually be in the time it takes to 
add such a document to the index; but not in query time.


Doesn't mean it's the best solution for your problem though, I can't say.

My impression is that Solr in general isn't really designed to support 
the kind of multi-tenancy use case people are talking about lately.  So 
trying to make it work anyway... if multi-cores work for you, then 
great, but be aware they weren't really designed for that (having 
thousands of cores) and may not. If a single index can work for you 
instead, great, but as you've discovered it's not neccesarily obvious 
how to set up the schema to do what you need -- really this applies to 
Solr in general, unlike an rdbms where you just third-form-normalize 
everything and figure it'll work for almost any use case that comes up,  
in Solr you generally need to custom fit the schema for your particular 
use cases, sometimes being kind of clever to figure out the optimal way 
to do that.


This is, I'd argue/agree, indeed kind of a disadvantage, setting up a 
Solr index takes more intellectual work than setting up an rdbms. The 
trade off is you get speed, and flexible ways to set up relevancy (that 
still perform well). Took a couple decades for rdbms to get as brainless 
to use as they are, maybe in a couple more we'll have figured out ways 
to make indexing engines like solr equally brainless, but not yet -- but 
it's still pretty damn easy for what it is, the lucene/Solr folks have 
done a remarkable job.

Re: Query only a specfic field with a specific value using Dismax Handler

Thanks Jonathan. FQ seems promising. I will give it a go.

Swapnonil Mukherjee




On 26-Oct-2010, at 7:29 PM, Jonathan Rochkind wrote:

 So, first of all, exact match is hard in Solr on tokenized fields.  
 Tokenized fields don't really do that.  So for exact match, you should 
 probably use a non-tokenized field (string or text with keywordtokenizer 
 (which should really be called the non-tokenizer)). If there's only one 
 token in your value anyway though, like a single number, it may not 
 matter and work fine.
 
 Secondly, I'd recommend combining a dismax query for the user-entered 
 phrase (like 'dog') with standard lucene queries for those other 
 things.  There are (at least) two ways to do that. The first is just put 
 everything after the first AND in one or more 'fq' parameters instead of 
 trying to include them in 'q'.  The second is to use Solr's nested query 
 syntax, to specify sub-queries with different query parsers. Someone can 
 explain the second if you need it, but the easier to understand 'fq' 
 approach seems right to me for your case.
 
 Swapnonil Mukherjee wrote:
 Hi Everybody,
 
 Let me give you a brief idea of our Solr document. We have about 6 text type 
 fields, each containing IPTC data extracted from photos. Search is performed 
 mostly on these 6 fields.
 We also have a mutlivalue field named group_id that contains a list of all 
 the  group_ids that have access to this photo.  In other words we are 
 storing the metadata of the photo as well as the permissions applicable for 
 this photo in the Solr document itself. This group_id field by the way is of 
 long type.
 
 Additionally we have certain boolean and constant type fields named 
 visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).
 
 The first field defaultSearch is a copyField which contains a copy of all 
 the values of 6 text type fields that I have mentioned.
 
 The way we query presently using the default search handler is like this.
 
 defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
 group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
 (visibleToEndUser:true)
 
 We want to start using the dismax (if not dismax then edismax)  query 
 handler but so far I have not been able to replicate the query mentioned 
 above to the equivalent dismax form.
 
 What I cannot figure out is?
 
 1. How do I apply exact match on the group_id, visibleToEndUser and the 
 entityType fields? Or How how do I query a specific field with a specific 
 value rather than searching across all fields with all values.
 2. How do I apply OR and AND conditions?
 
 
 Swapnonil Mukherjee

Re: Documents are deleted when Solr is restarted

2010-10-26 Thread Upayavira

You need to watch what you are setting your solr.home to. That is where
your indexes are being written. Are they getting overwritten/lost
somehow. Watch the files in that dir while doing a restart.

That's a start at least.

Upayavira

On Tue, 26 Oct 2010 16:40 +0300, Mackram Raydan mack...@gmail.com
wrote:
 Hey everyone,
 
 I apologize if this question is rudimentary but it is getting to me and 
 I did not find anything reasonable about it online.
 
 So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the 
 SolrTomcat wiki page to setup. The system works exactly the way I want 
 it (proper search, highlighting, etc...). The problem however is when I 
 restart my Tomcat server all the data in Solr (ie the index) is simply 
 lost. The admin shows me the number of docs is 0 when it was before in 
 the thousands.
 
 Can someone please help me understand why the above is happening and how 
 can I workaround it if possible?
 
 Big thanks for any help you can send my way.
 
 Regards,
 
 Mackram

Re: Documents are deleted when Solr is restarted

2010-10-26 Thread Israel Ekpo

The Solr home is the -Dsolr.solr.home Java System property

Also make sure that -Dsolr.data.dir is define for your data directory, if it
is not already defined in the solrconfig.xml file

On Tue, Oct 26, 2010 at 10:46 AM, Upayavira u...@odoko.co.uk wrote:

 You need to watch what you are setting your solr.home to. That is where
 your indexes are being written. Are they getting overwritten/lost
 somehow. Watch the files in that dir while doing a restart.

 That's a start at least.

 Upayavira

 On Tue, 26 Oct 2010 16:40 +0300, Mackram Raydan mack...@gmail.com
 wrote:
  Hey everyone,
 
  I apologize if this question is rudimentary but it is getting to me and
  I did not find anything reasonable about it online.
 
  So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the
  SolrTomcat wiki page to setup. The system works exactly the way I want
  it (proper search, highlighting, etc...). The problem however is when I
  restart my Tomcat server all the data in Solr (ie the index) is simply
  lost. The admin shows me the number of docs is 0 when it was before in
  the thousands.
 
  Can someone please help me understand why the above is happening and how
  can I workaround it if possible?
 
  Big thanks for any help you can send my way.
 
  Regards,
 
  Mackram
 




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Highlighting for non-stored fields

2010-10-26 Thread Phong Dais

Hi,

I understand that I need to store the fields in order to use highlighting
out of the box.
I'm looking for a way to highlighting using term offsets instead of the
actual text since the text is not stored.  What am asking is is it possible
to modify the response (thru custom implementation) to contain highlighted
offsets instead of the actual matched text.  Should I be writing my own
DefaultHighlighter?  Or overiding some of its functionality?  Can this be
done this way or am I way off?

BTW, I'm using solr-1.4.

Thanks,
P.

On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo israele...@gmail.com wrote:

 Check out this link

 http://wiki.apache.org/solr/FieldOptionsByUseCase

 You need to store the field if you want to use the highlighting feature.

 If you need to retrieve and display the highlighted snippets then the
 fields
 definitely needs to be stored.

 To use term offsets, it will be a good idea to enable the following
 attributes for that field  termVectors termPositions termOffsets

 The only issue here is that your storage costs will increase because of
 these extra features.

 Nevertheless, you definitely need to store the field if you need to
 retrieve
 it for highlighting purposes.

 On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com wrote:

  Hi,
 
  I've been looking thru the mailing archive for the past week and I
 haven't
  found any useful info regarding this issue.
 
  My requirement is to index a few terabytes worth of data to be searched.
  Due to the size of the data, I would like to index without storing but I
  would like to use the highlighting feature.  Is this even possible?  What
  are my options?
 
  I've read about termOffsets, payload that could possibly be used to do
 this
  but I have no idea how this could be done.
 
  Any pointers greatly appreciated.  Someone please point me in the right
  direction.
 
   I don't mind having to write some code or digging thru existing code to
  accomplish this task.
 
  Thanks,
  P.
 



 --
 °O°
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/

Inconsistent slave performance after optimize

2010-10-26 Thread Mason Hale

Hello esteemed Solr community --

I'm observing some inconsistent performance on our slave servers after
recently optimizing our master server.

Our configuration is as follows:

- all servers are hosted at Amazon EC2, running Ubuntu 8.04
- 1 master with heavy insert/update traffic, about 125K new documents
per day (m1.large, ~8GB RAM)
   - autocommit every 1 minute
- 3 slaves (m2.xlarge instance sizes, ~16GB RAM)
   - replicate every 5 minutes
   - we have configured autowarming queries for these machines
   - autowarmCount = 0
- Total index size is ~7M documents

We were seeing increasing, but gradual performance degradation across all
nodes.
So we decided to try optimizing our index to improve performance.

In preparation for the optimize we disabled replication polling on all
slaves. We also turned off all
workers that were writing to the index. Then we ran optimize on the master.

The optimize took 45-60 minutes to complete, and the total size went from
68GB down to 23GB.

We then enabled replication on each slave one at a time.

The first slave we re-enabled took about 15 minutes to copy the new files.
Once the files were copied
the performance of slave plummeted. Average response time went from 0.75 sec
to 45 seconds.
Over the past 18 hours the average response time has gradually gown down to
around 1.2 seconds now.

Before re-enabling replication the second slave, we first removed it from
our load-balanced pool of available search servers.
This server's average query performance also degraded quickly, and then
(unlike the first slave we replicated) did not improve.
It stayed at around 30 secs per query. On the theory that this is a
cache-warming issue, we added this server
back to the pool in hopes that additional traffic would warm the cache. But
what we saw was a quick spike of much worse
performance (50 sec / query on average) followed by a slow/gradual decline
in average response times.
As of now (10 hours after the initial replication) this server is still
reporting an average response time of ~2 seconds.
This is much worse than before the optimize and is a counter-intuitive
result. We expected an index 1/3 the size would be faster, not slower.

On the theory that the index files needed to be loaded into the file system
cache, I used the 'dd' command to copy
the contents of the data/index directory to /dev/null, but that did not
result in any noticeable performance improvement.

At this point, things were not going as expected. We did not expect the
replication after an optimize to result in such horrid
performance. So we decided to let the last slave continue to serve stale
results while we waited 4 hours for the
other two slaves to approach some acceptable performance level.

After the 4 hour break, we re-moved the 3rd and last slave server from our
load-balancing pool, then re-enabled replication.
This time we saw a tiny blip. The average performance went up to 1 second
briefly then went back to the (normal for us)
0.25 to 0.5 second range. We then added this server back to the
load-balancing pool and observed no degradation in performance.

While we were happy to avoid a repeat of the poor performance we saw on the
previous slaves, we are at a loss to explain
why this slave did not also have such poor performance.

At this point we're scratching our heads trying to understand:
   (a) Why the performance of the first two slaves was so terrible after the
optimize. We think its cache-warming related, but we're not sure.
  10 hours seems like a long time to wait for the cache to warm up
   (b) Why the performance of the third slave was barely impacted. It should
have hit the same cold-cache issues as the other servers, if that is indeed
the root cause.
   (c) Why performance of the first 2 slaves is still much worse after the
optimize than it was before the optimize,
  where the performance of the 3rd slave is pretty much unchanged. We
expected the optimize to *improve* performance.

All 3 slave servers are identically configured, and the procedure for
re-enabling replication was identical for the 2nd and 3rd
slaves, with the exception of a 4-hour wait period.

We have confirmed that the 3rd slave did replicate, the number of documents
and total index size matches the master and other slave servers.

I'm writing to fish for an explanation or ideas that might explain this
inconsistent performance. Obviously, we'd like to be able to reproduce the
performance of the 3rd slave, and avoid the poor performance of the first
two slaves the next time we decide it's time to optimize our index.

thanks in advance,

Mason

After java replication: field not found exception on slaves

2010-10-26 Thread Peter Karich


Hi,

we had the following problem. We added a field to schema.xml and fed our 
master with the new data.
After that querying on the master is fine. But when we replicated 
(solr1.4.0) to our slaves.
All slaves said they cannot find the new field (standard exception for 
missing fields).
And that although I can see the new field in the xml response and I can 
see it in the replicated schema.xml file!?


It is more strange that with scp-ing the exact data folder to our master 
all is fine (on the master).


Did somebody of you hit the same strange behaviour?

Regards,
Peter.


PS: Finally  we did on the slaves:
rm -rf data/
./reload.sh + replicated again

Re: Strange search

2010-10-26 Thread ramzesua


Can anyone tell my, why my search is so terrible? It's work realy strange.
Here my basic configs in schema.xml:
main filters:
fieldType name=text_rev class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory
withOriginal=true
   maxPosAsterisk=3 maxPosQuestion=2
maxFractionAsterisk=0.33/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


and fields:

field name=productId type=int indexed=true stored=true
multiValued=true/
   field name=categoryId type=int indexed=true stored=true
multiValued=true /
   field name=templateId type=int indexed=true stored=true
required=true /
   
   field name=templateSetName type=text indexed=true stored=false
/
   field name=templateSetCaption type=text indexed=true
stored=false /
   field name=templateSetDeleted type=int indexed=true stored=false
default=0/
   field name=templateSetDateCreate type=string indexed=true
stored=false /
   field name=templateSetPopularity type=float indexed=true
stored=false default=0/
   field name=templateSetText type=text indexed=true stored=false
multiValued=true /

   field name=typeName type=string indexed=true stored=false
multiValued=true/
   field name=typeCaption type=text indexed=true stored=false
multiValued=true/   

   field name=themeName type=string indexed=true stored=false /
   field name=themeCaption type=text indexed=true stored=false /
   field name=themeText type=text indexed=true stored=false /
   field name=text type=text indexed=true stored=false
multiValued=true/

uniqueKeytemplateId/uniqueKey

 defaultSearchFieldtext/defaultSearchField

 solrQueryParser defaultOperator=OR/

copyField source=templateSetName dest=text/
copyField source=templateSetCaption dest=text/
copyField source=typeName dest=text/
copyField source=typeCaption dest=text/
copyField source=themeName dest=text/
copyField source=themeCaption dest=text/
copyField source=themeText dest=text/

here schema for field typeCaption from
_http://localhost:8983/search/admin/schema.jsp;
html4
page4
template4
text4
main4
seo 3
meta2
tags1
keywords1

If I search html, I get all results, but if I search seo or text I
don't get any results. I try to use wildcard, but it don't help me. Can
anyone say, where is my problem. Sorry for my not well english.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1773307.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I this in Solr?

Overkill?

Dennis Gearon
 
 I can't think of a way to do it without writing new
 analysis filters.
 
 But I think you could do what you want with two filters
 (this is untested):
 
 1. An index-time filter that outputs a single token
 consisting of all of the input tokens, sorted in a
 consistent way, e.g.:
 
    mobile with GPS - GPS mobile
 with
    samsung android - android
 samsung
 
 2. A query-time filter that outputs one token per input
 term combination, sorted in the same consistent way as the
 index-time filter, e.g.:
 
    samsung andriod GPS
  -   
 samsung,android,GPS,
         android
 samsung,GPS samsung,android GPS
         android GPS
 samsung
 
 Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in Solr?
  
  Hi,
  
  I have lot of small documents (each containing 1 to 15
 words) indexed in
  Solr. For the search query, I want the search results
 to contain only
  those
  documents that satisfy this criteria All of the words
 of the search
  result
  document are present in the search query
  
  For example:
  If I have the following documents indexed: nokia
 n95, GPS, android,
  samsung, samsung andriod, nokia andriod, mobile
 with GPS
  
  If I search with the text samsung andriod GPS,
 search results should
  only
  conain samsung, GPS, andriod and samsung
 andriod.
  
  Is there a way to do this in Solr.
  
  --
  Thanks
  Varun Gupta

Re: Modelling Access Control

Son, don't touch that stove . . . .,

OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me 
that?!?#! You know I need to know WHY, not just DON'T!

Dennis Gearon

 Very important: do not make a spelling or autosuggest index
 from a
 text field which some people can see and other people
 can't.

Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall


Um.. you could change your default clause to AND rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis Gearon wrote:

Overkill?

Dennis Gearon

I can't think of a way to do it without writing new
analysis filters.

But I think you could do what you want with two filters
(this is untested):

1. An index-time filter that outputs a single token
consisting of all of the input tokens, sorted in a
consistent way, e.g.:

mobile with GPS -  GPS mobile
with
samsung android -  android
samsung

2. A query-time filter that outputs one token per input
term combination, sorted in the same consistent way as the
index-time filter, e.g.:

samsung andriod GPS
  -
samsung,android,GPS,

 android
samsung,GPS samsung,android GPS
 android GPS
samsung

Steve


-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 9:08 AM
To: solr-user@lucene.apache.org
Subject: How do I this in Solr?

Hi,

I have lot of small documents (each containing 1 to 15

words) indexed in

Solr. For the search query, I want the search results

to contain only

those
documents that satisfy this criteria All of the words

of the search

result
document are present in the search query

For example:
If I have the following documents indexed: nokia

n95, GPS, android,

samsung, samsung andriod, nokia andriod, mobile

with GPS

If I search with the text samsung andriod GPS,

search results should

only
conain samsung, GPS, andriod and samsung

andriod.

Is there a way to do this in Solr.

--
Thanks
Varun Gupta

Re: Highlighting for non-stored fields

2010-10-26 Thread Pradeep Singh

Another way you can do this is - after the search has completed, load the
field in your application, write separate code to reanalyze that
field/document, index it in RAM, and run it through highlighter classes. All
this as part of your web application outside of Solr. Considering the size
of your data it doesn't look advisable to store it because then you would be
almost doubling the size of your index (if you are looking to highlight on a
field then it's probably going to be full of content).

-Pradeep

On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais phong.gd...@gmail.com wrote:

 Hi,

 I understand that I need to store the fields in order to use highlighting
 out of the box.
 I'm looking for a way to highlighting using term offsets instead of the
 actual text since the text is not stored.  What am asking is is it possible
 to modify the response (thru custom implementation) to contain highlighted
 offsets instead of the actual matched text.  Should I be writing my own
 DefaultHighlighter?  Or overiding some of its functionality?  Can this be
 done this way or am I way off?

 BTW, I'm using solr-1.4.

 Thanks,
 P.

 On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo israele...@gmail.com wrote:

  Check out this link
 
  http://wiki.apache.org/solr/FieldOptionsByUseCase
 
  You need to store the field if you want to use the highlighting feature.
 
  If you need to retrieve and display the highlighted snippets then the
  fields
  definitely needs to be stored.
 
  To use term offsets, it will be a good idea to enable the following
  attributes for that field  termVectors termPositions termOffsets
 
  The only issue here is that your storage costs will increase because of
  these extra features.
 
  Nevertheless, you definitely need to store the field if you need to
  retrieve
  it for highlighting purposes.
 
  On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com
 wrote:
 
   Hi,
  
   I've been looking thru the mailing archive for the past week and I
  haven't
   found any useful info regarding this issue.
  
   My requirement is to index a few terabytes worth of data to be
 searched.
   Due to the size of the data, I would like to index without storing but
 I
   would like to use the highlighting feature.  Is this even possible?
  What
   are my options?
  
   I've read about termOffsets, payload that could possibly be used to do
  this
   but I have no idea how this could be done.
  
   Any pointers greatly appreciated.  Someone please point me in the right
   direction.
  
I don't mind having to write some code or digging thru existing code
 to
   accomplish this task.
  
   Thanks,
   P.
  
 
 
 
  --
  °O°
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/

RE: How do I this in Solr?

Um, maybe I'm way off base, but when Varun said:

 If I search with the text samsung andriod GPS,
 search results should only conain samsung, GPS,
 andriod and samsung andriod.

I interpreted that to mean that hit documents should contain terms from the 
query, and nothing else.  Making all terms required doesn't do this.

Steve

 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 2:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?
 
 Um.. you could change your default clause to AND rather than or.
 
 That should do the trick.
 
 Matt
 
 On 10/26/2010 2:26 PM, Dennis Gearon wrote:
  Overkill?
 
  Dennis Gearon
  I can't think of a way to do it without writing new
  analysis filters.
 
  But I think you could do what you want with two filters
  (this is untested):
 
  1. An index-time filter that outputs a single token
  consisting of all of the input tokens, sorted in a
  consistent way, e.g.:
 
  mobile with GPS -  GPS mobile
  with
  samsung android -  android
  samsung
 
  2. A query-time filter that outputs one token per input
  term combination, sorted in the same consistent way as the
  index-time filter, e.g.:
 
  samsung andriod GPS
-
  samsung,android,GPS,
   android
  samsung,GPS samsung,android GPS
   android GPS
  samsung
 
  Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010 9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in Solr?
 
  Hi,
 
  I have lot of small documents (each containing 1 to 15
  words) indexed in
  Solr. For the search query, I want the search results
  to contain only
  those
  documents that satisfy this criteria All of the words
  of the search
  result
  document are present in the search query
 
  For example:
  If I have the following documents indexed: nokia
  n95, GPS, android,
  samsung, samsung andriod, nokia andriod, mobile
  with GPS
  If I search with the text samsung andriod GPS,
  search results should
  only
  conain samsung, GPS, andriod and samsung
  andriod.
  Is there a way to do this in Solr.
 
  --
  Thanks
  Varun Gupta

Re: ClassCastException Issue

2010-10-26 Thread Chris Hostetter


: [ERROR][http-4443-exec-3][util.plugin.AbstractPluginLoader] log():139
: java.lang.ClassCastException: org.apache.solr.schema.StrField cannot
: be cast to org.apache.solr.schema.FieldType

This almost certainly inidcates a classloader issue - i suspect you have 
multiple solr related jars in various places, and the FieldType class 
instance found when StrField is loaded comes from a different 
(incompatible) jar.


-Hoss

RE: How do I this in Solr?

Good point. Since I might need such a query myself someday, how *IS* that done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 11:46 AM
 Um, maybe I'm way off base, but when
 Varun said:
 
  If I search with the text samsung andriod GPS,
  search results should only conain samsung, GPS,
  andriod and samsung andriod.
 
 I interpreted that to mean that hit documents should
 contain terms from the query, and nothing else.  Making
 all terms required doesn't do this.
 
 Steve
 
  -Original Message-
  From: Matthew Hall [mailto:mh...@informatics.jax.org]
  Sent: Tuesday, October 26, 2010 2:30 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in Solr?
  
  Um.. you could change your default clause to AND
 rather than or.
  
  That should do the trick.
  
  Matt
  
  On 10/26/2010 2:26 PM, Dennis Gearon wrote:
   Overkill?
  
   Dennis Gearon
   I can't think of a way to do it without
 writing new
   analysis filters.
  
   But I think you could do what you want with
 two filters
   (this is untested):
  
   1. An index-time filter that outputs a single
 token
   consisting of all of the input tokens, sorted
 in a
   consistent way, e.g.:
  
       mobile with GPS
 -  GPS mobile
   with
       samsung android
 -  android
   samsung
  
   2. A query-time filter that outputs one token
 per input
   term combination, sorted in the same
 consistent way as the
   index-time filter, e.g.:
  
       samsung andriod
 GPS
         -
   samsung,android,GPS,
            android
   samsung,GPS samsung,android GPS
            android
 GPS
   samsung
  
   Steve
  
   -Original Message-
   From: Varun Gupta [mailto:varun.vgu...@gmail.com]
   Sent: Tuesday, October 26, 2010 9:08 AM
   To: solr-user@lucene.apache.org
   Subject: How do I this in Solr?
  
   Hi,
  
   I have lot of small documents (each
 containing 1 to 15
   words) indexed in
   Solr. For the search query, I want the
 search results
   to contain only
   those
   documents that satisfy this criteria All
 of the words
   of the search
   result
   document are present in the search
 query
  
   For example:
   If I have the following documents
 indexed: nokia
   n95, GPS, android,
   samsung, samsung andriod, nokia
 andriod, mobile
   with GPS
   If I search with the text samsung
 andriod GPS,
   search results should
   only
   conain samsung, GPS, andriod and
 samsung
   andriod.
   Is there a way to do this in Solr.
  
   --
   Thanks
   Varun Gupta

RE: How do I this in Solr?

Dennis,

Do you mean to say that you read my earlier post, and disagree that it would 
solve the problem?  Or have you simply not read it?

Steve

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:00 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 Good point. Since I might need such a query myself someday, how *IS* that
 done?
 
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 11:46 AM
  Um, maybe I'm way off base, but when
  Varun said:
 
   If I search with the text samsung andriod GPS,
   search results should only conain samsung, GPS,
   andriod and samsung andriod.
 
  I interpreted that to mean that hit documents should
  contain terms from the query, and nothing else.  Making
  all terms required doesn't do this.
 
  Steve
 
   -Original Message-
   From: Matthew Hall [mailto:mh...@informatics.jax.org]
   Sent: Tuesday, October 26, 2010 2:30 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How do I this in Solr?
  
   Um.. you could change your default clause to AND
  rather than or.
  
   That should do the trick.
  
   Matt
  
   On 10/26/2010 2:26 PM, Dennis Gearon wrote:
Overkill?
   
Dennis Gearon
I can't think of a way to do it without
  writing new
analysis filters.
   
But I think you could do what you want with
  two filters
(this is untested):
   
1. An index-time filter that outputs a single
  token
consisting of all of the input tokens, sorted
  in a
consistent way, e.g.:
   
        mobile with GPS
  -  GPS mobile
with
        samsung android
  -  android
samsung
   
2. A query-time filter that outputs one token
  per input
term combination, sorted in the same
  consistent way as the
index-time filter, e.g.:
   
        samsung andriod
  GPS
          -
samsung,android,GPS,
             android
samsung,GPS samsung,android GPS
             android
  GPS
samsung
   
Steve
   
-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 9:08 AM
To: solr-user@lucene.apache.org
Subject: How do I this in Solr?
   
Hi,
   
I have lot of small documents (each
  containing 1 to 15
words) indexed in
Solr. For the search query, I want the
  search results
to contain only
those
documents that satisfy this criteria All
  of the words
of the search
result
document are present in the search
  query
   
For example:
If I have the following documents
  indexed: nokia
n95, GPS, android,
samsung, samsung andriod, nokia
  andriod, mobile
with GPS
If I search with the text samsung
  andriod GPS,
search results should
only
conain samsung, GPS, andriod and
  samsung
andriod.
Is there a way to do this in Solr.
   
--
Thanks
Varun Gupta

RE: How do I this in Solr?

If Solr is like Google, once documents matching only the ANDed items in the 
query ran out, then those that had only two of the terms, then only 1 of the 
terms, and then those close to it would start showing up.

Is this correct?

If so, it wouldn't match his requirements.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:10 PM
 Dennis,
 
 Do you mean to say that you read my earlier post, and
 disagree that it would solve the problem?  Or have you
 simply not read it?
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:00 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  Good point. Since I might need such a query myself
 someday, how *IS* that
  done?
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 11:46 AM
   Um, maybe I'm way off base, but when
   Varun said:
  
If I search with the text samsung andriod
 GPS,
search results should only conain samsung,
 GPS,
andriod and samsung andriod.
  
   I interpreted that to mean that hit documents
 should
   contain terms from the query, and nothing else. 
 Making
   all terms required doesn't do this.
  
   Steve
  
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010 2:30 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?
   
Um.. you could change your default clause to
 AND
   rather than or.
   
That should do the trick.
   
Matt
   
On 10/26/2010 2:26 PM, Dennis Gearon wrote:
 Overkill?

 Dennis Gearon
 I can't think of a way to do it
 without
   writing new
 analysis filters.

 But I think you could do what you
 want with
   two filters
 (this is untested):

 1. An index-time filter that
 outputs a single
   token
 consisting of all of the input
 tokens, sorted
   in a
 consistent way, e.g.:

     mobile with GPS
   -  GPS mobile
 with
     samsung android
   -  android
 samsung

 2. A query-time filter that outputs
 one token
   per input
 term combination, sorted in the
 same
   consistent way as the
 index-time filter, e.g.:

     samsung andriod
   GPS
       -
 samsung,android,GPS,
          android
 samsung,GPS samsung,android
 GPS
          android
   GPS
 samsung

 Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010
 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in
 Solr?

 Hi,

 I have lot of small documents
 (each
   containing 1 to 15
 words) indexed in
 Solr. For the search query, I
 want the
   search results
 to contain only
 those
 documents that satisfy this
 criteria All
   of the words
 of the search
 result
 document are present in the
 search
   query

 For example:
 If I have the following
 documents
   indexed: nokia
 n95, GPS, android,
 samsung, samsung andriod,
 nokia
   andriod, mobile
 with GPS
 If I search with the text
 samsung
   andriod GPS,
 search results should
 only
 conain samsung, GPS,
 andriod and
   samsung
 andriod.
 Is there a way to do this in
 Solr.

 --
 Thanks
 Varun Gupta

RE: How do I this in Solr?

Plus, if he wants terms that contain ONLY those words, and no others, an ANDed 
query would not do that, right? ANDed queries return results that must have ALL 
the terms listed, and could have lots of other words, right?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:10 PM
 Dennis,
 
 Do you mean to say that you read my earlier post, and
 disagree that it would solve the problem?  Or have you
 simply not read it?
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:00 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  Good point. Since I might need such a query myself
 someday, how *IS* that
  done?
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 11:46 AM
   Um, maybe I'm way off base, but when
   Varun said:
  
If I search with the text samsung andriod
 GPS,
search results should only conain samsung,
 GPS,
andriod and samsung andriod.
  
   I interpreted that to mean that hit documents
 should
   contain terms from the query, and nothing else. 
 Making
   all terms required doesn't do this.
  
   Steve
  
-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010 2:30 PM
To: solr-user@lucene.apache.org
Subject: Re: How do I this in Solr?
   
Um.. you could change your default clause to
 AND
   rather than or.
   
That should do the trick.
   
Matt
   
On 10/26/2010 2:26 PM, Dennis Gearon wrote:
 Overkill?

 Dennis Gearon
 I can't think of a way to do it
 without
   writing new
 analysis filters.

 But I think you could do what you
 want with
   two filters
 (this is untested):

 1. An index-time filter that
 outputs a single
   token
 consisting of all of the input
 tokens, sorted
   in a
 consistent way, e.g.:

     mobile with GPS
   -  GPS mobile
 with
     samsung android
   -  android
 samsung

 2. A query-time filter that outputs
 one token
   per input
 term combination, sorted in the
 same
   consistent way as the
 index-time filter, e.g.:

     samsung andriod
   GPS
       -
 samsung,android,GPS,
          android
 samsung,GPS samsung,android
 GPS
          android
   GPS
 samsung

 Steve

 -Original Message-
 From: Varun Gupta [mailto:varun.vgu...@gmail.com]
 Sent: Tuesday, October 26, 2010
 9:08 AM
 To: solr-user@lucene.apache.org
 Subject: How do I this in
 Solr?

 Hi,

 I have lot of small documents
 (each
   containing 1 to 15
 words) indexed in
 Solr. For the search query, I
 want the
   search results
 to contain only
 those
 documents that satisfy this
 criteria All
   of the words
 of the search
 result
 document are present in the
 search
   query

 For example:
 If I have the following
 documents
   indexed: nokia
 n95, GPS, android,
 samsung, samsung andriod,
 nokia
   andriod, mobile
 with GPS
 If I search with the text
 samsung
   andriod GPS,
 search results should
 only
 conain samsung, GPS,
 andriod and
   samsung
 andriod.
 Is there a way to do this in
 Solr.

 --
 Thanks
 Varun Gupta

How does DIH multithreading work?

2010-10-26 Thread markwaddle


I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I this in Solr?

Hi Dennis,

You wrote:
 If Solr is like Google, once documents matching only the ANDed items
 in the query ran out, then those that had only two of the terms, then
 only 1 of the terms, and then those close to it would start showing up.
[...]
 Plus, if he wants terms that contain ONLY those words, and no others, an
 ANDed query would not do that, right? ANDed queries return results that
 must have ALL the terms listed, and could have lots of other words, right?

This is *exactly* what I just said: ANDed queries (i.e., requiring all query 
terms) will not satisfy Varun's requirements.

Your participation in this thread looks an awful lot like flame-bating: Someone 
else asks a question, I answer with a possible solution, you give a one-word 
overkill response, I say why it's not overkill.  You then ask if anybody 
knows the answer to the original question, and then parrot my response to your 
overkill statement.  Really

Get your shit together or shut up.  Please.

Steve

 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:14 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:10 PM
  Dennis,
 
  Do you mean to say that you read my earlier post, and
  disagree that it would solve the problem?  Or have you
  simply not read it?
 
  Steve
 
   -Original Message-
   From: Dennis Gearon [mailto:gear...@sbcglobal.net]
   Sent: Tuesday, October 26, 2010 3:00 PM
   To: solr-user@lucene.apache.org
   Subject: RE: How do I this in Solr?
  
   Good point. Since I might need such a query myself
  someday, how *IS* that
   done?
  
  
   Dennis Gearon
  
   Signature Warning
   
   It is always a good idea to learn from your own
  mistakes. It is usually a
   better idea to learn from others’ mistakes, so you
  do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  
   EARTH has a Right To Life,
     otherwise we all die.
  
  
   --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
  wrote:
  
From: Steven A Rowe sar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 11:46 AM
Um, maybe I'm way off base, but when
Varun said:
   
 If I search with the text samsung andriod
  GPS,
 search results should only conain samsung,
  GPS,
 andriod and samsung andriod.
   
I interpreted that to mean that hit documents
  should
contain terms from the query, and nothing else.
  Making
all terms required doesn't do this.
   
Steve
   
 -Original Message-
 From: Matthew Hall [mailto:mh...@informatics.jax.org]
 Sent: Tuesday, October 26, 2010 2:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How do I this in Solr?

 Um.. you could change your default clause to
  AND
rather than or.

 That should do the trick.

 Matt

 On 10/26/2010 2:26 PM, Dennis Gearon wrote:
  Overkill?
 
  Dennis Gearon
  I can't think of a way to do it
  without
writing new
  analysis filters.
 
  But I think you could do what you
  want with
two filters
  (this is untested):
 
  1. An index-time filter that
  outputs a single
token
  consisting of all of the input
  tokens, sorted
in a
  consistent way, e.g.:
 
      mobile with GPS
-  GPS mobile
  with
      samsung android
-  android
  samsung
 
  2. A query-time filter that outputs
  one token
per input
  term combination, sorted in the
  same
consistent way as the
  index-time filter, e.g.:
 
      samsung andriod
GPS
        -
  samsung,android,GPS,
           android
  samsung,GPS samsung,android
  GPS
           android
GPS
  samsung
 
  Steve
 
  -Original Message-
  From: Varun Gupta [mailto:varun.vgu...@gmail.com]
  Sent: Tuesday, October 26, 2010
  9:08 AM
  To: solr-user@lucene.apache.org
  Subject: How do I this in
  Solr?
 
  Hi,
 
  I have lot of small documents
  (each
containing 1 to 15
  words) indexed in
  Solr. For the search query, I
  want the
search results
  to contain only
  those
  documents that

RE: How do I this in Solr?

I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:

 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 12:27 PM
 Hi Dennis,
 
 You wrote:
  If Solr is like Google, once documents matching only
 the ANDed items
  in the query ran out, then those that had only two of
 the terms, then
  only 1 of the terms, and then those close to it would
 start showing up.
 [...]
  Plus, if he wants terms that contain ONLY those words,
 and no others, an
  ANDed query would not do that, right? ANDed queries
 return results that
  must have ALL the terms listed, and could have lots of
 other words, right?
 
 This is *exactly* what I just said: ANDed queries (i.e.,
 requiring all query terms) will not satisfy Varun's
 requirements.
 
 Your participation in this thread looks an awful lot like
 flame-bating: Someone else asks a question, I answer with a
 possible solution, you give a one-word overkill response,
 I say why it's not overkill.  You then ask if anybody
 knows the answer to the original question, and then parrot
 my response to your overkill statement.  Really
 
 Get your shit together or shut up.  Please.
 
 Steve
 
  -Original Message-
  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
  Sent: Tuesday, October 26, 2010 3:14 PM
  To: solr-user@lucene.apache.org
  Subject: RE: How do I this in Solr?
  
  
  
  Dennis Gearon
  
  Signature Warning
  
  It is always a good idea to learn from your own
 mistakes. It is usually a
  better idea to learn from others’ mistakes, so you
 do not have to make
  them yourself. from
  'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
  EARTH has a Right To Life,
    otherwise we all die.
  
  
  --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
 wrote:
  
   From: Steven A Rowe sar...@syr.edu
   Subject: RE: How do I this in Solr?
   To: solr-user@lucene.apache.org
 solr-user@lucene.apache.org
   Date: Tuesday, October 26, 2010, 12:10 PM
   Dennis,
  
   Do you mean to say that you read my earlier post,
 and
   disagree that it would solve the problem?  Or
 have you
   simply not read it?
  
   Steve
  
-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?
   
Good point. Since I might need such a query
 myself
   someday, how *IS* that
done?
   
   
Dennis Gearon
   
Signature Warning

It is always a good idea to learn from your
 own
   mistakes. It is usually a
better idea to learn from others’
 mistakes, so you
   do not have to make
them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

  
   
EARTH has a Right To Life,
      otherwise we all die.
   
   
--- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
   wrote:
   
 From: Steven A Rowe sar...@syr.edu
 Subject: RE: How do I this in Solr?
 To: solr-user@lucene.apache.org
   solr-user@lucene.apache.org
 Date: Tuesday, October 26, 2010, 11:46
 AM
 Um, maybe I'm way off base, but when
 Varun said:

  If I search with the text samsung
 andriod
   GPS,
  search results should only conain
 samsung,
   GPS,
  andriod and samsung andriod.

 I interpreted that to mean that hit
 documents
   should
 contain terms from the query, and
 nothing else.
   Making
 all terms required doesn't do this.

 Steve

  -Original Message-
  From: Matthew Hall [mailto:mh...@informatics.jax.org]
  Sent: Tuesday, October 26, 2010
 2:30 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How do I this in
 Solr?
 
  Um.. you could change your default
 clause to
   AND
 rather than or.
 
  That should do the trick.
 
  Matt
 
  On 10/26/2010 2:26 PM, Dennis
 Gearon wrote:
   Overkill?
  
   Dennis Gearon
   I can't think of a way to
 do it
   without
 writing new
   analysis filters.
  
   But I think you could do
 what you
   want with
 two filters
   (this is untested):
  
   1. An index-time filter
 that
   outputs a single
 token
   consisting of all of the
 input
   tokens, sorted
 in a

RE: How do I this in Solr?

Dennis,

I wasn't trying to force your admission of my rectitude - I was just getting 
frustrated that the conversation was moving in spiral fashion, and was worried 
that you might have intentionally engineered that.

I'm glad to hear that you weren't flame baiting.

Steve


 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:35 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?
 
 I'm the LAST person anyone will ever need to worry about flame baiting.
 You did notice that I retracted what I said and supported your point of
 view?
 
 Sorry if my cryptic comment sounded critical. I was wrong, you were right
 :-)
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu wrote:
 
  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 12:27 PM
  Hi Dennis,
 
  You wrote:
   If Solr is like Google, once documents matching only
  the ANDed items
   in the query ran out, then those that had only two of
  the terms, then
   only 1 of the terms, and then those close to it would
  start showing up.
  [...]
   Plus, if he wants terms that contain ONLY those words,
  and no others, an
   ANDed query would not do that, right? ANDed queries
  return results that
   must have ALL the terms listed, and could have lots of
  other words, right?
 
  This is *exactly* what I just said: ANDed queries (i.e.,
  requiring all query terms) will not satisfy Varun's
  requirements.
 
  Your participation in this thread looks an awful lot like
  flame-bating: Someone else asks a question, I answer with a
  possible solution, you give a one-word overkill response,
  I say why it's not overkill.  You then ask if anybody
  knows the answer to the original question, and then parrot
  my response to your overkill statement.  Really
 
  Get your shit together or shut up.  Please.
 
  Steve
 
   -Original Message-
   From: Dennis Gearon [mailto:gear...@sbcglobal.net]
   Sent: Tuesday, October 26, 2010 3:14 PM
   To: solr-user@lucene.apache.org
   Subject: RE: How do I this in Solr?
  
  
  
   Dennis Gearon
  
   Signature Warning
   
   It is always a good idea to learn from your own
  mistakes. It is usually a
   better idea to learn from others’ mistakes, so you
  do not have to make
   them yourself. from
   'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  
   EARTH has a Right To Life,
     otherwise we all die.
  
  
   --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
  wrote:
  
From: Steven A Rowe sar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org
  solr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,
   
Do you mean to say that you read my earlier post,
  and
disagree that it would solve the problem?  Or
  have you
simply not read it?
   
Steve
   
 -Original Message-
 From: Dennis Gearon [mailto:gear...@sbcglobal.net]
 Sent: Tuesday, October 26, 2010 3:00 PM
 To: solr-user@lucene.apache.org
 Subject: RE: How do I this in Solr?

 Good point. Since I might need such a query
  myself
someday, how *IS* that
 done?


 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your
  own
mistakes. It is usually a
 better idea to learn from others’
  mistakes, so you
do not have to make
 them yourself. from
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  

 EARTH has a Right To Life,
   otherwise we all die.


 --- On Tue, 10/26/10, Steven A Rowe sar...@syr.edu
wrote:

  From: Steven A Rowe sar...@syr.edu
  Subject: RE: How do I this in Solr?
  To: solr-user@lucene.apache.org
solr-user@lucene.apache.org
  Date: Tuesday, October 26, 2010, 11:46
  AM
  Um, maybe I'm way off base, but when
  Varun said:
 
   If I search with the text samsung
  andriod
GPS,
   search results should only conain
  samsung,
GPS,
   andriod and samsung andriod.
 
  I interpreted that to mean that hit
  documents
should
  contain terms from the query, and
  nothing else.
Making
  all terms required doesn't do this.
 
  Steve
 
   -Original Message-
   From: Matthew Hall [mailto:mh...@informatics.jax.org]
   Sent: Tuesday, October 26, 2010
  2:30 PM
   To: solr-user@lucene.apache.org
   Subject: Re: How

Re: Highlighting for non-stored fields

2010-10-26 Thread Phong Dais

Thanks for the insight.
This is definitely a feasible solution because I only need to highlight when
the user open the document.
I guess the easiest way I can do this is to reuse the solr code (with some
modification) in my own application.

On Tue, Oct 26, 2010 at 2:35 PM, Pradeep Singh pksing...@gmail.com wrote:

 Another way you can do this is - after the search has completed, load the
 field in your application, write separate code to reanalyze that
 field/document, index it in RAM, and run it through highlighter classes.
 All
 this as part of your web application outside of Solr. Considering the size
 of your data it doesn't look advisable to store it because then you would
 be
 almost doubling the size of your index (if you are looking to highlight on
 a
 field then it's probably going to be full of content).

 -Pradeep

 On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais phong.gd...@gmail.com wrote:

  Hi,
 
  I understand that I need to store the fields in order to use highlighting
  out of the box.
  I'm looking for a way to highlighting using term offsets instead of the
  actual text since the text is not stored.  What am asking is is it
 possible
  to modify the response (thru custom implementation) to contain
 highlighted
  offsets instead of the actual matched text.  Should I be writing my own
  DefaultHighlighter?  Or overiding some of its functionality?  Can this be
  done this way or am I way off?
 
  BTW, I'm using solr-1.4.
 
  Thanks,
  P.
 
  On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo israele...@gmail.com
 wrote:
 
   Check out this link
  
   http://wiki.apache.org/solr/FieldOptionsByUseCase
  
   You need to store the field if you want to use the highlighting
 feature.
  
   If you need to retrieve and display the highlighted snippets then the
   fields
   definitely needs to be stored.
  
   To use term offsets, it will be a good idea to enable the following
   attributes for that field  termVectors termPositions termOffsets
  
   The only issue here is that your storage costs will increase because of
   these extra features.
  
   Nevertheless, you definitely need to store the field if you need to
   retrieve
   it for highlighting purposes.
  
   On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com
  wrote:
  
Hi,
   
I've been looking thru the mailing archive for the past week and I
   haven't
found any useful info regarding this issue.
   
My requirement is to index a few terabytes worth of data to be
  searched.
Due to the size of the data, I would like to index without storing
 but
  I
would like to use the highlighting feature.  Is this even possible?
   What
are my options?
   
I've read about termOffsets, payload that could possibly be used to
 do
   this
but I have no idea how this could be done.
   
Any pointers greatly appreciated.  Someone please point me in the
 right
direction.
   
 I don't mind having to write some code or digging thru existing code
  to
accomplish this task.
   
Thanks,
P.
   
  
  
  
   --
   °O°
   Good Enough is not good enough.
   To give anything less than your best is to sacrifice the gift.
   Quality First. Measure Twice. Cut Once.
   http://www.israelekpo.com/

Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall

Indeed, I'd missed the second part of his requirements, my and solution 
is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates all 
of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:Nokia AND text:Mobile AND text:GPS AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:

I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:


From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word overkill response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your overkill statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text samsung

andriod

GPS,

search results should only conain

samsung,

GPS,

andriod and samsung andriod.

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To: solr-user@lucene.apache.org
Subject: Re: How do I this in

Solr?

Um.. you could change your default

clause to

AND

rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis

Gearon wrote:

Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall

Bah.. nope this would miss documents that only match a subset of the 
given terms.


I'm going to have to go with Steven's approach as the right choice here.

Matt

On 10/26/2010 3:44 PM, Matthew Hall wrote:
Indeed, I'd missed the second part of his requirements, my and 
solution is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates 
all of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:Nokia AND text:Mobile AND text:GPS AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:
I'm the LAST person anyone will ever need to worry about flame 
baiting. You did notice that I retracted what I said and supported 
your point of view?


Sorry if my cryptic comment sounded critical. I was wrong, you were 
right :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is 
usually a better idea to learn from others’ mistakes, so you do not 
have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu  wrote:


From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word overkill response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your overkill statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowesar...@syr.edu

wrote:

From: Steven A Rowesar...@syr.edu
Subject: RE: How do I this in Solr?
To: solr-user@lucene.apache.org

solr-user@lucene.apache.org

Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text samsung

andriod

GPS,

search results should only conain

samsung,

GPS,

andriod and samsung andriod.

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To:

Jars required in classpath to run embedded solr server?

2010-10-26 Thread Tharindu Mathew

Hi everyone,

Do we need all lucene jars in the class path for this? Seems that the
solr-solrj and solr-core jars are not enough
(http://wiki.apache.org/solr/Solrj). It is asking for lucene jars in
the classpath. Could I know what jars are required to run this?

Thanks in advance.

-- 
Regards,

Tharindu

Re: Strange search

2010-10-26 Thread ramzesua


Try to do some changes, but it's not help:
In _http://localhost:8983/search/admin/schema.jsp  I have, for example, term
main and frequency 7 for this term. But if I try to find this I don't
get any result. If I use wildcard, I have only 4 docs in response.
But if I try to find term html (frequency  5) I don't get any result
even with wildcard. Where is problem and how I can it solvе?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1774059.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I this in Solr?