Re: Looking for Developers

2010-10-26 Thread Pradeep Singh
This is the second time he has sent this shit. Kill his subscription. Is it
possible?

On Tue, Oct 26, 2010 at 10:38 PM, Yuchen Wang  wrote:

> UNSUBSCRIBE
>
> On Tue, Oct 26, 2010 at 10:15 PM, Igor Chudov  wrote:
>
> > UNSUBSCRIBE
> >
> > On Wed, Oct 27, 2010 at 12:14 AM, ST ST  wrote:
> > > Looking for Developers Experienced in Solr/Lucene And/OR FAST Search
> > Engines
> > > from India (Pune)
> > >
> > > We are looking for off-shore India Based Developers who are proficient
> in
> > > Solr/Lucene and/or FAST search engine .
> > > Developers in the cities of Pune/Bombay in India are preferred.
> > Development
> > > is for projects based in US for a reputed firm.
> > >
> > > If you are proficient in Solr/Lucene/FAST and have 5 years minimum
> > industry
> > > experience with atleast 3 years in Search Development,
> > > please send me your resume.
> > >
> > > Thanks
> > >
> >
>


Re: how well does multicore scale?

2010-10-26 Thread Lance Norskog
Creating a unique id for a schema is one of those design tasks:

http://wiki.apache.org/solr/UniqueKey

A marvelously lucid and well-written page, if I do say so. And I do.

On Tue, Oct 26, 2010 at 10:16 PM, Tharindu Mathew  wrote:
> Really great to know you were able to fire up about 100 cores. But,
> when it scales up to around 1000 or even more. I wonder how it would
> perform.
>
> I have a question regarding ids i.e. the unique key. Since there is a
> potential use case that two users might add the same document, how
> would we set the id. I was thinking of appending the user id to the an
> id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would
> replace the document of one user, which is not what we want.
>
> This is also applicable to deleteById. Is there a better way to do this?
>
> On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind  wrote:
>> mike anderson wrote:
>>>
>>> I'm really curious if there is a clever solution to the obvious problem
>>> with: "So your better off using a single index and with a user id and use
>>> a query filter with the user id when fetching data.", i.e.. when you have
>>> hundreds of thousands of user IDs tagged on each article. That just
>>> doesn't
>>> sound like it scales very well..
>>>
>>
>> Actually, I think that design would scale pretty fine, I don't think there's
>> an 'obvious' problem. You store your userIDs in a multi-valued field (or as
>> multiple terms in a single value, ends up being similar). You fq on there
>> with the current userID.   There's one way to find out of course, but that
>> doesn't seem a patently ridiculous scenario or anything, that's the kind of
>> thing Solr is generally good at, it's what it's built for.   The problem
>> might actually be in the time it takes to add such a document to the index;
>> but not in query time.
>>
>> Doesn't mean it's the best solution for your problem though, I can't say.
>>
>> My impression is that Solr in general isn't really designed to support the
>> kind of multi-tenancy use case people are talking about lately.  So trying
>> to make it work anyway... if multi-cores work for you, then great, but be
>> aware they weren't really designed for that (having thousands of cores) and
>> may not. If a single index can work for you instead, great, but as you've
>> discovered it's not neccesarily obvious how to set up the schema to do what
>> you need -- really this applies to Solr in general, unlike an rdbms where
>> you just third-form-normalize everything and figure it'll work for almost
>> any use case that comes up,  in Solr you generally need to custom fit the
>> schema for your particular use cases, sometimes being kind of clever to
>> figure out the optimal way to do that.
>>
>> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
>> index takes more intellectual work than setting up an rdbms. The trade off
>> is you get speed, and flexible ways to set up relevancy (that still perform
>> well). Took a couple decades for rdbms to get as brainless to use as they
>> are, maybe in a couple more we'll have figured out ways to make indexing
>> engines like solr equally brainless, but not yet -- but it's still pretty
>> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>>
>
>
>
> --
> Regards,
>
> Tharindu
>



-- 
Lance Norskog
goks...@gmail.com


Re: Looking for Developers

2010-10-26 Thread Yuchen Wang
UNSUBSCRIBE

On Tue, Oct 26, 2010 at 10:15 PM, Igor Chudov  wrote:

> UNSUBSCRIBE
>
> On Wed, Oct 27, 2010 at 12:14 AM, ST ST  wrote:
> > Looking for Developers Experienced in Solr/Lucene And/OR FAST Search
> Engines
> > from India (Pune)
> >
> > We are looking for off-shore India Based Developers who are proficient in
> > Solr/Lucene and/or FAST search engine .
> > Developers in the cities of Pune/Bombay in India are preferred.
> Development
> > is for projects based in US for a reputed firm.
> >
> > If you are proficient in Solr/Lucene/FAST and have 5 years minimum
> industry
> > experience with atleast 3 years in Search Development,
> > please send me your resume.
> >
> > Thanks
> >
>


Re: Looking for Developers

2010-10-26 Thread Igor Chudov
UNSUBSCRIBE

On Wed, Oct 27, 2010 at 12:14 AM, ST ST  wrote:
> Looking for Developers Experienced in Solr/Lucene And/OR FAST Search Engines
> from India (Pune)
>
> We are looking for off-shore India Based Developers who are proficient in
> Solr/Lucene and/or FAST search engine .
> Developers in the cities of Pune/Bombay in India are preferred. Development
> is for projects based in US for a reputed firm.
>
> If you are proficient in Solr/Lucene/FAST and have 5 years minimum industry
> experience with atleast 3 years in Search Development,
> please send me your resume.
>
> Thanks
>


Re: how well does multicore scale?

2010-10-26 Thread Tharindu Mathew
Really great to know you were able to fire up about 100 cores. But,
when it scales up to around 1000 or even more. I wonder how it would
perform.

I have a question regarding ids i.e. the unique key. Since there is a
potential use case that two users might add the same document, how
would we set the id. I was thinking of appending the user id to the an
id I would use ex: "/system/bar.pdfuserid25". Otherwise, solr would
replace the document of one user, which is not what we want.

This is also applicable to deleteById. Is there a better way to do this?

On Tue, Oct 26, 2010 at 7:45 PM, Jonathan Rochkind  wrote:
> mike anderson wrote:
>>
>> I'm really curious if there is a clever solution to the obvious problem
>> with: "So your better off using a single index and with a user id and use
>> a query filter with the user id when fetching data.", i.e.. when you have
>> hundreds of thousands of user IDs tagged on each article. That just
>> doesn't
>> sound like it scales very well..
>>
>
> Actually, I think that design would scale pretty fine, I don't think there's
> an 'obvious' problem. You store your userIDs in a multi-valued field (or as
> multiple terms in a single value, ends up being similar). You fq on there
> with the current userID.   There's one way to find out of course, but that
> doesn't seem a patently ridiculous scenario or anything, that's the kind of
> thing Solr is generally good at, it's what it's built for.   The problem
> might actually be in the time it takes to add such a document to the index;
> but not in query time.
>
> Doesn't mean it's the best solution for your problem though, I can't say.
>
> My impression is that Solr in general isn't really designed to support the
> kind of multi-tenancy use case people are talking about lately.  So trying
> to make it work anyway... if multi-cores work for you, then great, but be
> aware they weren't really designed for that (having thousands of cores) and
> may not. If a single index can work for you instead, great, but as you've
> discovered it's not neccesarily obvious how to set up the schema to do what
> you need -- really this applies to Solr in general, unlike an rdbms where
> you just third-form-normalize everything and figure it'll work for almost
> any use case that comes up,  in Solr you generally need to custom fit the
> schema for your particular use cases, sometimes being kind of clever to
> figure out the optimal way to do that.
>
> This is, I'd argue/agree, indeed kind of a disadvantage, setting up a Solr
> index takes more intellectual work than setting up an rdbms. The trade off
> is you get speed, and flexible ways to set up relevancy (that still perform
> well). Took a couple decades for rdbms to get as brainless to use as they
> are, maybe in a couple more we'll have figured out ways to make indexing
> engines like solr equally brainless, but not yet -- but it's still pretty
> damn easy for what it is, the lucene/Solr folks have done a remarkable job.
>



-- 
Regards,

Tharindu


Re: Solr sorting problem

2010-10-26 Thread Ron Mayer
Erick Erickson wrote:
> In general, the behavior when sorting is not predictable when
> sorting on a tokenized field, which "text" is. What would
> it mean to sort on a field with "erick" "Moazzam" as tokens
> in a single document? Should it be in the "e"s or the "m"s?

Might it be possible or reasonable to have it show up under
both "e" and "m"?  Or if not, just at the first one it finds?

I've recently been asked a similar question where we wanted
to sort documents by a victim's age.  I have a victim_age
field, but since there can be multiple victims in an incident
it wasn't a unique field.   As a workaround, I added a
"victim_age_min" field; but it would have been easier if
I didn't need to do that.

> That said, you probably want to watch out for case
> 
> Best
> Erick
> 
> On Fri, Oct 22, 2010 at 10:02 AM, Moazzam Khan  wrote:
> 
>> For anyone who faced the same problem, changing the field to string
>> from text worked!
>>
>> -Moazzam
>>
>> On Fri, Oct 22, 2010 at 8:50 AM, Moazzam Khan  wrote:
>>> The field type of the first name and last name is text. Could that be
>>> why it's not sorting properly? I just changed it to string and started
>>> a full-import. Hopefully that will work.
>>>
>>> Thanks,
>>> Moazzam
>>>
>>> On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil
>>>  wrote:
 need additional information .
 Sorting is easy in Solr just by passing the sort parameter

 However, when it comes to text sorting it depends on how you analyse
 and tokenize your fields
 Sorting does not work on fields with multiple tokens.

>> http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F
 On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan 
>> wrote:
> Hey guys,
>
> I have a list of people indexed in Solr. I am trying to sort by their
> first names but I keep getting results that are not alphabetically
> sorted (I see the names starting with W before the names starting with
> A). I have a feeling that the results are first being sorted by
> relevancy then sorted by first name.
>
> Is there a way I can get the results to be sorted alphabetically?
>
> Thanks,
> Moazzam
>
> 



Re: How do I this in Solr?

2010-10-26 Thread Varun Gupta
Thanks everybody for the inputs.

Looks like Steven's solution is the closest one but will lead to performance
issues when the query string has many terms.

I will try to implement the two filters suggested by Steven and see how the
performance matches up.

--
Thanks
Varun Gupta


On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) wrote:

> I think you have to write a "yet exact match" handler yourself (I mean yet
> cause it's not quite exact match we normally know). Steve's answer is quite
> near your request. You can do further work based on his solution.
>
> At the last step, I'll suggest you eat up all blank within query string and
> query result, respevtively & only returns those results that has equal
> string length as the query string's.
>
> For example, giving:
> *query string = "Samsung with GPS"
> *query results:
> resutl 1 = "Samsung has lots of mobile with GPS"
> result 2 = "with GPS Samsng"
> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>
> they become:
> *query result = "SamsungwithGPS" (length =14)
> *query results:
> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
> result 2 = "withGPSSamsng" (length =14)
> result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>
> so result 2 matches your request.
>
> In this way, you can avoid case-sensitive, word-order-rearrange load of
> works. Furthermore, you can do refined work, such as remove white
> characters, etc.
>
> Scott @ Taiwan
>
>
> - Original Message - From: "Varun Gupta" 
>
> To: 
> Sent: Tuesday, October 26, 2010 9:07 PM
>
> Subject: How do I this in Solr?
>
>
>  Hi,
>>
>> I have lot of small documents (each containing 1 to 15 words) indexed in
>> Solr. For the search query, I want the search results to contain only
>> those
>> documents that satisfy this criteria "All of the words of the search
>> result
>> document are present in the search query"
>>
>> For example:
>> If I have the following documents indexed: "nokia n95", "GPS", "android",
>> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
>>
>> If I search with the text "samsung andriod GPS", search results should
>> only
>> conain "samsung", "GPS", "andriod" and "samsung andriod".
>>
>> Is there a way to do this in Solr.
>>
>> --
>> Thanks
>> Varun Gupta
>>
>>
>
>
> 
>
>
>
> %<&b6G$J0T.'$$'d(l/f,r!C
> Checked by AVG - www.avg.com
> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10
> 14:34:00
>
>


how to index raw data

2010-10-26 Thread jayant

Hi, I wanted to use a few fields from the dataase, but cannot use the DIH
because jdbc access to the database is not allowed. We can only go thru a
wrapper. As such, I would like to know how I can index the data obtained
through the db wrapper, using solrJ. I would have two fields to index - id
and a text field containing the data.
Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-index-raw-data-tp1778033p1778033.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: FieldCollapsing and Stats or Sum ?!

2010-10-26 Thread Lance Norskog
Do you want one number, or the sum for each group? For one number, the
stats component is fine.

For one number per group, grouping does not (yet) support the stats
component. This is the old SQL "Group By" command, right?

On Tue, Oct 26, 2010 at 6:42 AM, stockiii  wrote:
>
> Hello.
>
> we want to group with field collapsing and we want a sum of this groups.
>
> in example:
> group by currency_id: EUR, CHF, ...
> and for this groups, the correct sum of the documents from the field: amount
>
> ist this in one Request possible ? or its necessary do this in several
> requests ?
> maybe first grouping and then using the statsComponent to get the sum of the
> group by sending a new request with the filter ? but then i dont need
> grouping !?!?
>
> thx =)
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/FieldCollapsing-and-Stats-or-Sum-tp1773842p1773842.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


Re: How do I this in Solr?

2010-10-26 Thread 朱炎詹
I think you have to write a "yet exact match" handler yourself (I mean yet 
cause it's not quite exact match we normally know). Steve's answer is quite 
near your request. You can do further work based on his solution.


At the last step, I'll suggest you eat up all blank within query string and 
query result, respevtively & only returns those results that has equal 
string length as the query string's.


For example, giving:
*query string = "Samsung with GPS"
*query results:
resutl 1 = "Samsung has lots of mobile with GPS"
result 2 = "with GPS Samsng"
result 3 = "GPS mobile with vendors, such as Sony, Samsung"

they become:
*query result = "SamsungwithGPS" (length =14)
*query results:
resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
result 2 = "withGPSSamsng" (length =14)
result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)

so result 2 matches your request.

In this way, you can avoid case-sensitive, word-order-rearrange load of 
works. Furthermore, you can do refined work, such as remove white 
characters, etc.


Scott @ Taiwan


- Original Message - 
From: "Varun Gupta" 

To: 
Sent: Tuesday, October 26, 2010 9:07 PM
Subject: How do I this in Solr?



Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only 
those
documents that satisfy this criteria "All of the words of the search 
result

document are present in the search query"

For example:
If I have the following documents indexed: "nokia n95", "GPS", "android",
"samsung", "samsung andriod", "nokia andriod", "mobile with GPS"

If I search with the text "samsung andriod GPS", search results should 
only

conain "samsung", "GPS", "andriod" and "samsung andriod".

Is there a way to do this in Solr.

--
Thanks
Varun Gupta








%<&b6G$J0T.'$$'d(l/f,r!C
Checked by AVG - www.avg.com
Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: 10/26/10 
14:34:00




Re: Multiple Word Facets

2010-10-26 Thread Ahmet Arslan
Facets are generated from indexed terms.

Depending on your need/use-case: 

You can use a additional separate String field (which is not tokenized) for 
facets, populate it via copyField. Search on tokenized field facet on 
non-tokenized field.

Or

You can add solr.ShingleFilterFactory to your index analyzer to form multiple 
word terms.

--- On Wed, 10/27/10, Adam Estrada  wrote:

> From: Adam Estrada 
> Subject: Multiple Word Facets
> To: solr-user@lucene.apache.org
> Date: Wednesday, October 27, 2010, 4:43 AM
> All,
> I am a new to Solr faceting and stuck on how to get
> multiple-word
> facets returned from a standard Solr query. See below for
> what is
> currently being returned.
> 
> 
> 
> 
> 
> 89
> 87
> 87
> 87
> 84
> 60
> 32
> 22
> 19
> 15
> 15
> 14
> 12
> 11
> 10
> 9
> 7
> 7
> 7
> 6
> 6
> 6
> 6
> ...etc...
> 
> There are many terms in there that are 2 or 3 word phrases.
> For
> example, Eastern Federal Lands Highway Division all gets
> broken down
> in to the individual words that make up the total group of
> words. I've
> seen quite a few websites that do what it is I am trying to
> do here so
> any suggestions at this point would be great. See my schema
> below
> (copied from the example schema).
> 
>      class="solr.TextField" positionIncrementGap="100">
>       
>           class="solr.WhitespaceTokenizerFactory"/>
>      class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
>          class="solr.StopFilterFactory"
>                
> ignoreCase="true"
>                
> words="stopwords.txt"
>                
> enablePositionIncrements="true"
>                
> />
>      class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0"
> catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
>          class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       
> 
> Similar for type="query". Please advise on how to group or
> cluster
> document terms so that they can be used as facets.
> 
> Many thanks in advance,
> Adam Estrada
> 





Re: Multiple Word Facets

2010-10-26 Thread Pradeep Singh
Use this field type -










On Tue, Oct 26, 2010 at 6:43 PM, Adam Estrada wrote:

> All,
> I am a new to Solr faceting and stuck on how to get multiple-word
> facets returned from a standard Solr query. See below for what is
> currently being returned.
>
> 
> 
> 
> 
> 89
> 87
> 87
> 87
> 84
> 60
> 32
> 22
> 19
> 15
> 15
> 14
> 12
> 11
> 10
> 9
> 7
> 7
> 7
> 6
> 6
> 6
> 6
> ...etc...
>
> There are many terms in there that are 2 or 3 word phrases. For
> example, Eastern Federal Lands Highway Division all gets broken down
> in to the individual words that make up the total group of words. I've
> seen quite a few websites that do what it is I am trying to do here so
> any suggestions at this point would be great. See my schema below
> (copied from the example schema).
>
> positionIncrementGap="100">
>  
> 
> ignoreCase="true" expand="false"/>
>ignoreCase="true"
>words="stopwords.txt"
>enablePositionIncrements="true"
>/>
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
>
>  
>
> Similar for type="query". Please advise on how to group or cluster
> document terms so that they can be used as facets.
>
> Many thanks in advance,
> Adam Estrada
>


Re: snapshot-4.0 and maven

2010-10-26 Thread Tommy Chheng
You use maven-assembly-plugin's jar-with-dependencies to build a single 
jar with all its dependencies


http://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven

@tommychheng

On 10/19/10 6:53 AM, Matt Mitchell wrote:

Hey thanks Tommy. To be more specific, I'm trying to use SolrJ in a
clojure project. When I try to use SolrJ using what you showed me, I
get errors saying lucene classes can't be found etc.. Is there a way
to build everything SolrJ (snapshot-4.0) needs into one jar?

Matt

On Mon, Oct 18, 2010 at 11:01 PM, Tommy Chheng  wrote:

Once you built the solr 4.0 jar, you can use mvn's install command like
this:

mvn install:install-file -DgroupId=org.apache -DartifactId=solr
-Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar
-DgeneratePom=true

@tommychheng

On 10/18/10 7:28 PM, Matt Mitchell wrote:

I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
this possible to do? If so, could someone give me a tip or two on
getting started?

Thanks,
Matt



Multiple Word Facets

2010-10-26 Thread Adam Estrada
All,
I am a new to Solr faceting and stuck on how to get multiple-word
facets returned from a standard Solr query. See below for what is
currently being returned.





89
87
87
87
84
60
32
22
19
15
15
14
12
11
10
9
7
7
7
6
6
6
6
...etc...

There are many terms in there that are 2 or 3 word phrases. For
example, Eastern Federal Lands Highway Division all gets broken down
in to the individual words that make up the total group of words. I've
seen quite a few websites that do what it is I am trying to do here so
any suggestions at this point would be great. See my schema below
(copied from the example schema).


  
 




  

Similar for type="query". Please advise on how to group or cluster
document terms so that they can be used as facets.

Many thanks in advance,
Adam Estrada


Re: ClassCastException Issue

2010-10-26 Thread Ken Stanley
On Mon, Oct 25, 2010 at 2:45 AM, Alex Matviychuk  wrote:

> Getting this when deploying to tomcat:
>
> [INFO][http-4443-exec-3][solr.schema.IndexSchema] readSchema():394
> Reading Solr Schema
> [INFO][http-4443-exec-3][solr.schema.IndexSchema] readSchema():408
> Schema name=tsadmin
> [ERROR][http-4443-exec-3][util.plugin.AbstractPluginLoader] log():139
> java.lang.ClassCastException: org.apache.solr.schema.StrField cannot
> be cast to org.apache.solr.schema.FieldType
>at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:419)
>at org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:447)
>at
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
>at
> org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:456)
>at org.apache.solr.schema.IndexSchema.(IndexSchema.java:95)
>at org.apache.solr.core.SolrCore.(SolrCore.java:520)
>at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
>
>
> solr schema:
>
> 
> 
>
> sortMissingLast="true" omitNorms="true"/>
>...
>
>
>   
>   ...
>
> 
>
>
> Any ideas?
>
> Thanks,
> Alex Matviychuk
>


Alex,

I've run into this issue myself, and it was because I tried to create a
fieldType called string (like you). Rename "string" to something else and
the exception should go away.

- Ken


RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Matt,

I think your concern about performance is spot-on, though.

The combinatorial explosion would be at query time, not at index time - my 
solution has a single token indexed per document. My suggested query-time 
filter would generate the following number of output terms, where C(n,k) is the 
combination of n things taken k at a time, n is the number of input query 
terms, and k is the number of concatenated input query terms forming one output 
query term:

C(n,1)+C(n,2)...+C(n,n-1)+C(n,n)

For small queries this would not be a problem:

1 input query term -> 1 output query term
2 input query terms -> 3 output query terms
3 input query terms -> 7 output query terms
4 input query terms -> 15 output query terms

But for larger queries, it could be fairly expensive:

10 input query terms -> 1,023 output query terms
...
15 input query terms -> 32,767 output query terms

This is exactly (2^n - 1) output query terms, where n is the number of input 
terms.

32k query terms might be too slow to be functional.

Steve

> -Original Message-
> From: Matthew Hall [mailto:mh...@informatics.jax.org]
> Sent: Tuesday, October 26, 2010 3:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How do I this in Solr?
> 
> Bah.. nope this would miss documents that only match a subset of the
> given terms.
> 
> I'm going to have to go with Steven's approach as the right choice here.
> 
> Matt
> 
> On 10/26/2010 3:44 PM, Matthew Hall wrote:
> > Indeed, I'd missed the second part of his requirements, my and
> > solution is sadly insufficient to this task.
> >
> > The combinatorial part of you solution worries me a bit though Steven,
> > because his documents that are on the larger side of his corpus would
> > likely slow down query performance a bit while the filter calculates
> > all of the possibilities for a given document.
> >
> > I'm wondering if a slightly hybrid approach would be valid:
> >
> > Have a filter that calculates the total number of terms for a given
> > document.  And then add a clause into your query at runtime that would
> > match what the filter would come up with:
> >
> > So:
> >
> > text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3
> >
> > Something like that anyhow.
> >
> > Matt
> >
> > On 10/26/2010 3:35 PM, Dennis Gearon wrote:
> >> I'm the LAST person anyone will ever need to worry about flame
> >> baiting. You did notice that I retracted what I said and supported
> >> your point of view?
> >>
> >> Sorry if my cryptic comment sounded critical. I was wrong, you were
> >> right :-)
> >> Dennis Gearon
> >>
> >> Signature Warning
> >> 
> >> It is always a good idea to learn from your own mistakes. It is
> >> usually a better idea to learn from others’ mistakes, so you do not
> >> have to make them yourself. from
> >> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> >>
> >> EARTH has a Right To Life,
> >>otherwise we all die.
> >>
> >>
> >> --- On Tue, 10/26/10, Steven A Rowe  wrote:
> >>
> >>> From: Steven A Rowe
> >>> Subject: RE: How do I this in Solr?
> >>> To: "solr-user@lucene.apache.org"
> >>> Date: Tuesday, October 26, 2010, 12:27 PM
> >>> Hi Dennis,
> >>>
> >>> You wrote:
>  If Solr is like Google, once documents matching only
> >>> the ANDed items
>  in the query ran out, then those that had only two of
> >>> the terms, then
>  only 1 of the terms, and then those close to it would
> >>> start showing up.
> >>> [...]
>  Plus, if he wants terms that contain ONLY those words,
> >>> and no others, an
>  ANDed query would not do that, right? ANDed queries
> >>> return results that
>  must have ALL the terms listed, and could have lots of
> >>> other words, right?
> >>>
> >>> This is *exactly* what I just said: ANDed queries (i.e.,
> >>> requiring all query terms) will not satisfy Varun's
> >>> requirements.
> >>>
> >>> Your participation in this thread looks an awful lot like
> >>> flame-bating: Someone else asks a question, I answer with a
> >>> possible solution, you give a one-word "overkill" response,
> >>> I say why it's not overkill.  You then ask if anybody
> >>> knows the answer to the original question, and then parrot
> >>> my response to your "overkill" statement.  Really
> >>>
> >>> Get your shit together or shut up.  Please.
> >>>
> >>> Steve
> >>>
>  -Original Message-
>  From: Dennis Gearon [mailto:gear...@sbcglobal.net]
>  Sent: Tuesday, October 26, 2010 3:14 PM
>  To: solr-user@lucene.apache.org
>  Subject: RE: How do I this in Solr?
> 
> 
> 
>  Dennis Gearon
> 
>  Signature Warning
>  
>  It is always a good idea to learn from your own
> >>> mistakes. It is usually a
>  better idea to learn from others’ mistakes, so you
> >>> do not have to make
>  them yourself. from
>  'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>  EART

Re: Strange search

2010-10-26 Thread ramzesua

Try to do some changes, but it's not help:
In _http://localhost:8983/search/admin/schema.jsp  I have, for example, term
"main" and frequency "7" for this term. But if I try to find this I don't
get any result. If I use wildcard, I have only 4 docs in response.
But if I try to find term "html" (frequency  "5") I don't get any result
even with wildcard. Where is problem and how I can it solvе?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1774059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Jars required in classpath to run embedded solr server?

2010-10-26 Thread Tharindu Mathew
Hi everyone,

Do we need all lucene jars in the class path for this? Seems that the
solr-solrj and solr-core jars are not enough
(http://wiki.apache.org/solr/Solrj). It is asking for lucene jars in
the classpath. Could I know what jars are required to run this?

Thanks in advance.

-- 
Regards,

Tharindu


Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall
Bah.. nope this would miss documents that only match a subset of the 
given terms.


I'm going to have to go with Steven's approach as the right choice here.

Matt

On 10/26/2010 3:44 PM, Matthew Hall wrote:
Indeed, I'd missed the second part of his requirements, my and 
solution is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates 
all of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:
I'm the LAST person anyone will ever need to worry about flame 
baiting. You did notice that I retracted what I said and supported 
your point of view?


Sorry if my cryptic comment sounded critical. I was wrong, you were 
right :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is 
usually a better idea to learn from others’ mistakes, so you do not 
have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe  wrote:


From: Steven A Rowe
Subject: RE: How do I this in Solr?
To: "solr-user@lucene.apache.org"
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word "overkill" response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your "overkill" statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe

wrote:

From: Steven A Rowe
Subject: RE: How do I this in Solr?
To: "solr-user@lucene.apache.org"



Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe

wrote:

From: Steven A Rowe
Subject: RE: How do I this in Solr?
To: "solr-user@lucene.apache.org"



Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text "samsung

andriod

GPS",

search results should only conain

"samsung",

"GPS",

"andriod" and "samsung andriod".

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To: solr-user@lucene.apache.org
Subject: Re: How do I this in

Solr?

Um.. you could change your default

clause to

AND

rather than or.

That shoul

Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall
Indeed, I'd missed the second part of his requirements, my and solution 
is sadly insufficient to this task.


The combinatorial part of you solution worries me a bit though Steven, 
because his documents that are on the larger side of his corpus would 
likely slow down query performance a bit while the filter calculates all 
of the possibilities for a given document.


I'm wondering if a slightly hybrid approach would be valid:

Have a filter that calculates the total number of terms for a given 
document.  And then add a clause into your query at runtime that would 
match what the filter would come up with:


So:

text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3

Something like that anyhow.

Matt

On 10/26/2010 3:35 PM, Dennis Gearon wrote:

I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. from 
'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
   otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe  wrote:


From: Steven A Rowe
Subject: RE: How do I this in Solr?
To: "solr-user@lucene.apache.org"
Date: Tuesday, October 26, 2010, 12:27 PM
Hi Dennis,

You wrote:

If Solr is like Google, once documents matching only

the ANDed items

in the query ran out, then those that had only two of

the terms, then

only 1 of the terms, and then those close to it would

start showing up.
[...]

Plus, if he wants terms that contain ONLY those words,

and no others, an

ANDed query would not do that, right? ANDed queries

return results that

must have ALL the terms listed, and could have lots of

other words, right?

This is *exactly* what I just said: ANDed queries (i.e.,
requiring all query terms) will not satisfy Varun's
requirements.

Your participation in this thread looks an awful lot like
flame-bating: Someone else asks a question, I answer with a
possible solution, you give a one-word "overkill" response,
I say why it's not overkill.  You then ask if anybody
knows the answer to the original question, and then parrot
my response to your "overkill" statement.  Really

Get your shit together or shut up.  Please.

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:14 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?



Dennis Gearon

Signature Warning

It is always a good idea to learn from your own

mistakes. It is usually a

better idea to learn from others’ mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe

wrote:

From: Steven A Rowe
Subject: RE: How do I this in Solr?
To: "solr-user@lucene.apache.org"



Date: Tuesday, October 26, 2010, 12:10 PM
Dennis,

Do you mean to say that you read my earlier post,

and

disagree that it would solve the problem?  Or

have you

simply not read it?

Steve


-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net]
Sent: Tuesday, October 26, 2010 3:00 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I this in Solr?

Good point. Since I might need such a query

myself

someday, how *IS* that

done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your

own

mistakes. It is usually a

better idea to learn from others’

mistakes, so you

do not have to make

them yourself. from
'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe

wrote:

From: Steven A Rowe
Subject: RE: How do I this in Solr?
To: "solr-user@lucene.apache.org"



Date: Tuesday, October 26, 2010, 11:46

AM

Um, maybe I'm way off base, but when
Varun said:


If I search with the text "samsung

andriod

GPS",

search results should only conain

"samsung",

"GPS",

"andriod" and "samsung andriod".

I interpreted that to mean that hit

documents

should

contain terms from the query, and

nothing else.

Making

all terms required doesn't do this.

Steve


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org]
Sent: Tuesday, October 26, 2010

2:30 PM

To: solr-user@lucene.apache.org
Subject: Re: How do I this in

Solr?

Um.. you could change your default

clause to

AND

rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis

Gearon wrote:

Overkill?

Dennis Gearon

I can't think of a way to

do it

without

writing new

analysis filters.

But I think you could do

what you

want wi

Re: Highlighting for non-stored fields

2010-10-26 Thread Phong Dais
Thanks for the insight.
This is definitely a feasible solution because I only need to highlight when
the user open the document.
I guess the easiest way I can do this is to "reuse" the solr code (with some
modification) in my own application.

On Tue, Oct 26, 2010 at 2:35 PM, Pradeep Singh  wrote:

> Another way you can do this is - after the search has completed, load the
> field in your application, write separate code to reanalyze that
> field/document, index it in RAM, and run it through highlighter classes.
> All
> this as part of your web application outside of Solr. Considering the size
> of your data it doesn't look advisable to store it because then you would
> be
> almost doubling the size of your index (if you are looking to highlight on
> a
> field then it's probably going to be full of content).
>
> -Pradeep
>
> On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais  wrote:
>
> > Hi,
> >
> > I understand that I need to store the fields in order to use highlighting
> > "out of the box".
> > I'm looking for a way to highlighting using term offsets instead of the
> > actual text since the text is not stored.  What am asking is is it
> possible
> > to modify the response (thru custom implementation) to contain
> highlighted
> > offsets instead of the actual matched text.  Should I be writing my own
> > DefaultHighlighter?  Or overiding some of its functionality?  Can this be
> > done this way or am I way off?
> >
> > BTW, I'm using solr-1.4.
> >
> > Thanks,
> > P.
> >
> > On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo 
> wrote:
> >
> > > Check out this link
> > >
> > > http://wiki.apache.org/solr/FieldOptionsByUseCase
> > >
> > > You need to store the field if you want to use the highlighting
> feature.
> > >
> > > If you need to retrieve and display the highlighted snippets then the
> > > fields
> > > definitely needs to be stored.
> > >
> > > To use term offsets, it will be a good idea to enable the following
> > > attributes for that field  termVectors termPositions termOffsets
> > >
> > > The only issue here is that your storage costs will increase because of
> > > these extra features.
> > >
> > > Nevertheless, you definitely need to store the field if you need to
> > > retrieve
> > > it for highlighting purposes.
> > >
> > > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I've been looking thru the mailing archive for the past week and I
> > > haven't
> > > > found any useful info regarding this issue.
> > > >
> > > > My requirement is to index a few terabytes worth of data to be
> > searched.
> > > > Due to the size of the data, I would like to index without storing
> but
> > I
> > > > would like to use the highlighting feature.  Is this even possible?
> >  What
> > > > are my options?
> > > >
> > > > I've read about termOffsets, payload that could possibly be used to
> do
> > > this
> > > > but I have no idea how this could be done.
> > > >
> > > > Any pointers greatly appreciated.  Someone please point me in the
> right
> > > > direction.
> > > >
> > > >  I don't mind having to write some code or digging thru existing code
> > to
> > > > accomplish this task.
> > > >
> > > > Thanks,
> > > > P.
> > > >
> > >
> > >
> > >
> > > --
> > > °O°
> > > "Good Enough" is not good enough.
> > > To give anything less than your best is to sacrifice the gift.
> > > Quality First. Measure Twice. Cut Once.
> > > http://www.israelekpo.com/
> > >
> >
>


RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Dennis,

I wasn't trying to force your admission of my rectitude - I was just getting 
frustrated that the conversation was moving in spiral fashion, and was worried 
that you might have intentionally engineered that.

I'm glad to hear that you weren't flame baiting.

Steve


> -Original Message-
> From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> Sent: Tuesday, October 26, 2010 3:35 PM
> To: solr-user@lucene.apache.org
> Subject: RE: How do I this in Solr?
> 
> I'm the LAST person anyone will ever need to worry about flame baiting.
> You did notice that I retracted what I said and supported your point of
> view?
> 
> Sorry if my cryptic comment sounded critical. I was wrong, you were right
> :-)
> Dennis Gearon
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Tue, 10/26/10, Steven A Rowe  wrote:
> 
> > From: Steven A Rowe 
> > Subject: RE: How do I this in Solr?
> > To: "solr-user@lucene.apache.org" 
> > Date: Tuesday, October 26, 2010, 12:27 PM
> > Hi Dennis,
> >
> > You wrote:
> > > If Solr is like Google, once documents matching only
> > the ANDed items
> > > in the query ran out, then those that had only two of
> > the terms, then
> > > only 1 of the terms, and then those close to it would
> > start showing up.
> > [...]
> > > Plus, if he wants terms that contain ONLY those words,
> > and no others, an
> > > ANDed query would not do that, right? ANDed queries
> > return results that
> > > must have ALL the terms listed, and could have lots of
> > other words, right?
> >
> > This is *exactly* what I just said: ANDed queries (i.e.,
> > requiring all query terms) will not satisfy Varun's
> > requirements.
> >
> > Your participation in this thread looks an awful lot like
> > flame-bating: Someone else asks a question, I answer with a
> > possible solution, you give a one-word "overkill" response,
> > I say why it's not overkill.  You then ask if anybody
> > knows the answer to the original question, and then parrot
> > my response to your "overkill" statement.  Really
> >
> > Get your shit together or shut up.  Please.
> >
> > Steve
> >
> > > -Original Message-
> > > From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> > > Sent: Tuesday, October 26, 2010 3:14 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: How do I this in Solr?
> > >
> > >
> > >
> > > Dennis Gearon
> > >
> > > Signature Warning
> > > 
> > > It is always a good idea to learn from your own
> > mistakes. It is usually a
> > > better idea to learn from others’ mistakes, so you
> > do not have to make
> > > them yourself. from
> > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> > >
> > > EARTH has a Right To Life,
> > >   otherwise we all die.
> > >
> > >
> > > --- On Tue, 10/26/10, Steven A Rowe 
> > wrote:
> > >
> > > > From: Steven A Rowe 
> > > > Subject: RE: How do I this in Solr?
> > > > To: "solr-user@lucene.apache.org"
> > 
> > > > Date: Tuesday, October 26, 2010, 12:10 PM
> > > > Dennis,
> > > >
> > > > Do you mean to say that you read my earlier post,
> > and
> > > > disagree that it would solve the problem?  Or
> > have you
> > > > simply not read it?
> > > >
> > > > Steve
> > > >
> > > > > -Original Message-
> > > > > From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> > > > > Sent: Tuesday, October 26, 2010 3:00 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: RE: How do I this in Solr?
> > > > >
> > > > > Good point. Since I might need such a query
> > myself
> > > > someday, how *IS* that
> > > > > done?
> > > > >
> > > > >
> > > > > Dennis Gearon
> > > > >
> > > > > Signature Warning
> > > > > 
> > > > > It is always a good idea to learn from your
> > own
> > > > mistakes. It is usually a
> > > > > better idea to learn from others’
> > mistakes, so you
> > > > do not have to make
> > > > > them yourself. from
> > > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> > >
> > > > >
> > > > > EARTH has a Right To Life,
> > > > >   otherwise we all die.
> > > > >
> > > > >
> > > > > --- On Tue, 10/26/10, Steven A Rowe 
> > > > wrote:
> > > > >
> > > > > > From: Steven A Rowe 
> > > > > > Subject: RE: How do I this in Solr?
> > > > > > To: "solr-user@lucene.apache.org"
> > > > 
> > > > > > Date: Tuesday, October 26, 2010, 11:46
> > AM
> > > > > > Um, maybe I'm way off base, but when
> > > > > > Varun said:
> > > > > >
> > > > > > > If I search with the text "samsung
> > andriod
> > > > GPS",
> > > > > > > search results should only conain
> > "samsung",
> > > > "GPS",
> > > > > > > "andriod" and "samsung andriod".
> > > > > >
> > > > > > I interpreted that to mean that hit
> > documents
> > > > shou

RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
I'm the LAST person anyone will ever need to worry about flame baiting. You did 
notice that I retracted what I said and supported your point of view?

Sorry if my cryptic comment sounded critical. I was wrong, you were right :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe  wrote:

> From: Steven A Rowe 
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" 
> Date: Tuesday, October 26, 2010, 12:27 PM
> Hi Dennis,
> 
> You wrote:
> > If Solr is like Google, once documents matching only
> the ANDed items
> > in the query ran out, then those that had only two of
> the terms, then
> > only 1 of the terms, and then those close to it would
> start showing up.
> [...]
> > Plus, if he wants terms that contain ONLY those words,
> and no others, an
> > ANDed query would not do that, right? ANDed queries
> return results that
> > must have ALL the terms listed, and could have lots of
> other words, right?
> 
> This is *exactly* what I just said: ANDed queries (i.e.,
> requiring all query terms) will not satisfy Varun's
> requirements.
> 
> Your participation in this thread looks an awful lot like
> flame-bating: Someone else asks a question, I answer with a
> possible solution, you give a one-word "overkill" response,
> I say why it's not overkill.  You then ask if anybody
> knows the answer to the original question, and then parrot
> my response to your "overkill" statement.  Really
> 
> Get your shit together or shut up.  Please.
> 
> Steve
> 
> > -Original Message-
> > From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> > Sent: Tuesday, October 26, 2010 3:14 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: How do I this in Solr?
> > 
> > 
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > 
> > It is always a good idea to learn from your own
> mistakes. It is usually a
> > better idea to learn from others’ mistakes, so you
> do not have to make
> > them yourself. from
> > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > EARTH has a Right To Life,
> >   otherwise we all die.
> > 
> > 
> > --- On Tue, 10/26/10, Steven A Rowe 
> wrote:
> > 
> > > From: Steven A Rowe 
> > > Subject: RE: How do I this in Solr?
> > > To: "solr-user@lucene.apache.org"
> 
> > > Date: Tuesday, October 26, 2010, 12:10 PM
> > > Dennis,
> > >
> > > Do you mean to say that you read my earlier post,
> and
> > > disagree that it would solve the problem?  Or
> have you
> > > simply not read it?
> > >
> > > Steve
> > >
> > > > -Original Message-
> > > > From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> > > > Sent: Tuesday, October 26, 2010 3:00 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: RE: How do I this in Solr?
> > > >
> > > > Good point. Since I might need such a query
> myself
> > > someday, how *IS* that
> > > > done?
> > > >
> > > >
> > > > Dennis Gearon
> > > >
> > > > Signature Warning
> > > > 
> > > > It is always a good idea to learn from your
> own
> > > mistakes. It is usually a
> > > > better idea to learn from others’
> mistakes, so you
> > > do not have to make
> > > > them yourself. from
> > > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > > >
> > > > EARTH has a Right To Life,
> > > >   otherwise we all die.
> > > >
> > > >
> > > > --- On Tue, 10/26/10, Steven A Rowe 
> > > wrote:
> > > >
> > > > > From: Steven A Rowe 
> > > > > Subject: RE: How do I this in Solr?
> > > > > To: "solr-user@lucene.apache.org"
> > > 
> > > > > Date: Tuesday, October 26, 2010, 11:46
> AM
> > > > > Um, maybe I'm way off base, but when
> > > > > Varun said:
> > > > >
> > > > > > If I search with the text "samsung
> andriod
> > > GPS",
> > > > > > search results should only conain
> "samsung",
> > > "GPS",
> > > > > > "andriod" and "samsung andriod".
> > > > >
> > > > > I interpreted that to mean that hit
> documents
> > > should
> > > > > contain terms from the query, and
> nothing else.
> > > Making
> > > > > all terms required doesn't do this.
> > > > >
> > > > > Steve
> > > > >
> > > > > > -Original Message-
> > > > > > From: Matthew Hall [mailto:mh...@informatics.jax.org]
> > > > > > Sent: Tuesday, October 26, 2010
> 2:30 PM
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: Re: How do I this in
> Solr?
> > > > > >
> > > > > > Um.. you could change your default
> clause to
> > > AND
> > > > > rather than or.
> > > > > >
> > > > > > That should do the trick.
> > > > > >
> > > > > > Matt
> > > > > >
> > > > > > On 10/26/2010 2:26 PM, Dennis
> Gearon wrote:
> > > > > > > Overkill?
> > > > > > >
> > > > > > > Dennis Gearon
> > > > > > >> I can't think of

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Dennis,

You wrote:
> If Solr is like Google, once documents matching only the ANDed items
> in the query ran out, then those that had only two of the terms, then
> only 1 of the terms, and then those close to it would start showing up.
[...]
> Plus, if he wants terms that contain ONLY those words, and no others, an
> ANDed query would not do that, right? ANDed queries return results that
> must have ALL the terms listed, and could have lots of other words, right?

This is *exactly* what I just said: ANDed queries (i.e., requiring all query 
terms) will not satisfy Varun's requirements.

Your participation in this thread looks an awful lot like flame-bating: Someone 
else asks a question, I answer with a possible solution, you give a one-word 
"overkill" response, I say why it's not overkill.  You then ask if anybody 
knows the answer to the original question, and then parrot my response to your 
"overkill" statement.  Really

Get your shit together or shut up.  Please.

Steve

> -Original Message-
> From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> Sent: Tuesday, October 26, 2010 3:14 PM
> To: solr-user@lucene.apache.org
> Subject: RE: How do I this in Solr?
> 
> 
> 
> Dennis Gearon
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Tue, 10/26/10, Steven A Rowe  wrote:
> 
> > From: Steven A Rowe 
> > Subject: RE: How do I this in Solr?
> > To: "solr-user@lucene.apache.org" 
> > Date: Tuesday, October 26, 2010, 12:10 PM
> > Dennis,
> >
> > Do you mean to say that you read my earlier post, and
> > disagree that it would solve the problem?  Or have you
> > simply not read it?
> >
> > Steve
> >
> > > -Original Message-
> > > From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> > > Sent: Tuesday, October 26, 2010 3:00 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: How do I this in Solr?
> > >
> > > Good point. Since I might need such a query myself
> > someday, how *IS* that
> > > done?
> > >
> > >
> > > Dennis Gearon
> > >
> > > Signature Warning
> > > 
> > > It is always a good idea to learn from your own
> > mistakes. It is usually a
> > > better idea to learn from others’ mistakes, so you
> > do not have to make
> > > them yourself. from
> > > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> > >
> > > EARTH has a Right To Life,
> > >   otherwise we all die.
> > >
> > >
> > > --- On Tue, 10/26/10, Steven A Rowe 
> > wrote:
> > >
> > > > From: Steven A Rowe 
> > > > Subject: RE: How do I this in Solr?
> > > > To: "solr-user@lucene.apache.org"
> > 
> > > > Date: Tuesday, October 26, 2010, 11:46 AM
> > > > Um, maybe I'm way off base, but when
> > > > Varun said:
> > > >
> > > > > If I search with the text "samsung andriod
> > GPS",
> > > > > search results should only conain "samsung",
> > "GPS",
> > > > > "andriod" and "samsung andriod".
> > > >
> > > > I interpreted that to mean that hit documents
> > should
> > > > contain terms from the query, and nothing else.
> > Making
> > > > all terms required doesn't do this.
> > > >
> > > > Steve
> > > >
> > > > > -Original Message-
> > > > > From: Matthew Hall [mailto:mh...@informatics.jax.org]
> > > > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: How do I this in Solr?
> > > > >
> > > > > Um.. you could change your default clause to
> > AND
> > > > rather than or.
> > > > >
> > > > > That should do the trick.
> > > > >
> > > > > Matt
> > > > >
> > > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > > > Overkill?
> > > > > >
> > > > > > Dennis Gearon
> > > > > >> I can't think of a way to do it
> > without
> > > > writing new
> > > > > >> analysis filters.
> > > > > >>
> > > > > >> But I think you could do what you
> > want with
> > > > two filters
> > > > > >> (this is untested):
> > > > > >>
> > > > > >> 1. An index-time filter that
> > outputs a single
> > > > token
> > > > > >> consisting of all of the input
> > tokens, sorted
> > > > in a
> > > > > >> consistent way, e.g.:
> > > > > >>
> > > > > >>     "mobile with GPS"
> > > > ->  "GPS mobile
> > > > > >> with"
> > > > > >>     "samsung android"
> > > > ->  "android
> > > > > >> samsung"
> > > > > >>
> > > > > >> 2. A query-time filter that outputs
> > one token
> > > > per input
> > > > > >> term combination, sorted in the
> > same
> > > > consistent way as the
> > > > > >> index-time filter, e.g.:
> > > > > >>
> > > > > >>     "samsung andriod
> > > > GPS"
> > > > > >>       ->
> > > > > >> "samsung","android","GPS",
> > > > > >>          "android
> > > > > >> samsung","GPS samsung","android
> > GPS"
> > > > > >>          "android
> > > > GPS
> > > > > >> samsung

How does DIH multithreading work?

2010-10-26 Thread markwaddle

I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Plus, if he wants terms that contain ONLY those words, and no others, an ANDed 
query would not do that, right? ANDed queries return results that must have ALL 
the terms listed, and could have lots of other words, right?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe  wrote:

> From: Steven A Rowe 
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" 
> Date: Tuesday, October 26, 2010, 12:10 PM
> Dennis,
> 
> Do you mean to say that you read my earlier post, and
> disagree that it would solve the problem?  Or have you
> simply not read it?
> 
> Steve
> 
> > -Original Message-
> > From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> > Sent: Tuesday, October 26, 2010 3:00 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: How do I this in Solr?
> > 
> > Good point. Since I might need such a query myself
> someday, how *IS* that
> > done?
> > 
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > 
> > It is always a good idea to learn from your own
> mistakes. It is usually a
> > better idea to learn from others’ mistakes, so you
> do not have to make
> > them yourself. from
> > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > EARTH has a Right To Life,
> >   otherwise we all die.
> > 
> > 
> > --- On Tue, 10/26/10, Steven A Rowe 
> wrote:
> > 
> > > From: Steven A Rowe 
> > > Subject: RE: How do I this in Solr?
> > > To: "solr-user@lucene.apache.org"
> 
> > > Date: Tuesday, October 26, 2010, 11:46 AM
> > > Um, maybe I'm way off base, but when
> > > Varun said:
> > >
> > > > If I search with the text "samsung andriod
> GPS",
> > > > search results should only conain "samsung",
> "GPS",
> > > > "andriod" and "samsung andriod".
> > >
> > > I interpreted that to mean that hit documents
> should
> > > contain terms from the query, and nothing else. 
> Making
> > > all terms required doesn't do this.
> > >
> > > Steve
> > >
> > > > -Original Message-
> > > > From: Matthew Hall [mailto:mh...@informatics.jax.org]
> > > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: How do I this in Solr?
> > > >
> > > > Um.. you could change your default clause to
> AND
> > > rather than or.
> > > >
> > > > That should do the trick.
> > > >
> > > > Matt
> > > >
> > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > > Overkill?
> > > > >
> > > > > Dennis Gearon
> > > > >> I can't think of a way to do it
> without
> > > writing new
> > > > >> analysis filters.
> > > > >>
> > > > >> But I think you could do what you
> want with
> > > two filters
> > > > >> (this is untested):
> > > > >>
> > > > >> 1. An index-time filter that
> outputs a single
> > > token
> > > > >> consisting of all of the input
> tokens, sorted
> > > in a
> > > > >> consistent way, e.g.:
> > > > >>
> > > > >>     "mobile with GPS"
> > > ->  "GPS mobile
> > > > >> with"
> > > > >>     "samsung android"
> > > ->  "android
> > > > >> samsung"
> > > > >>
> > > > >> 2. A query-time filter that outputs
> one token
> > > per input
> > > > >> term combination, sorted in the
> same
> > > consistent way as the
> > > > >> index-time filter, e.g.:
> > > > >>
> > > > >>     "samsung andriod
> > > GPS"
> > > > >>       ->
> > > > >> "samsung","android","GPS",
> > > > >>          "android
> > > > >> samsung","GPS samsung","android
> GPS"
> > > > >>          "android
> > > GPS
> > > > >> samsung"
> > > > >>
> > > > >> Steve
> > > > >>
> > > > >>> -Original Message-
> > > > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> > > > >>> Sent: Tuesday, October 26, 2010
> 9:08 AM
> > > > >>> To: solr-user@lucene.apache.org
> > > > >>> Subject: How do I this in
> Solr?
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have lot of small documents
> (each
> > > containing 1 to 15
> > > > >> words) indexed in
> > > > >>> Solr. For the search query, I
> want the
> > > search results
> > > > >> to contain only
> > > > >>> those
> > > > >>> documents that satisfy this
> criteria "All
> > > of the words
> > > > >> of the search
> > > > >>> result
> > > > >>> document are present in the
> search
> > > query"
> > > > >>>
> > > > >>> For example:
> > > > >>> If I have the following
> documents
> > > indexed: "nokia
> > > > >> n95", "GPS", "android",
> > > > >>> "samsung", "samsung andriod",
> "nokia
> > > andriod", "mobile
> > > > >> with GPS"
> > > > >>> If I search with the text
> "samsung
> > > andriod GPS",
> > > > >> search results should
> > > > >>> only
> > > > >>> conain "samsung", "GPS",
> "andriod" and
> > > "samsung
> > > > >> andriod".
> > > > >>> Is there a way to do this in
> Solr.
> > > > >>>
> > > > >>

RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
If Solr is like Google, once documents matching only the ANDed items in the 
query ran out, then those that had only two of the terms, then only 1 of the 
terms, and then those close to it would start showing up.

Is this correct?

If so, it wouldn't match his requirements.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe  wrote:

> From: Steven A Rowe 
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" 
> Date: Tuesday, October 26, 2010, 12:10 PM
> Dennis,
> 
> Do you mean to say that you read my earlier post, and
> disagree that it would solve the problem?  Or have you
> simply not read it?
> 
> Steve
> 
> > -Original Message-
> > From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> > Sent: Tuesday, October 26, 2010 3:00 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: How do I this in Solr?
> > 
> > Good point. Since I might need such a query myself
> someday, how *IS* that
> > done?
> > 
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > 
> > It is always a good idea to learn from your own
> mistakes. It is usually a
> > better idea to learn from others’ mistakes, so you
> do not have to make
> > them yourself. from
> > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

> > 
> > EARTH has a Right To Life,
> >   otherwise we all die.
> > 
> > 
> > --- On Tue, 10/26/10, Steven A Rowe 
> wrote:
> > 
> > > From: Steven A Rowe 
> > > Subject: RE: How do I this in Solr?
> > > To: "solr-user@lucene.apache.org"
> 
> > > Date: Tuesday, October 26, 2010, 11:46 AM
> > > Um, maybe I'm way off base, but when
> > > Varun said:
> > >
> > > > If I search with the text "samsung andriod
> GPS",
> > > > search results should only conain "samsung",
> "GPS",
> > > > "andriod" and "samsung andriod".
> > >
> > > I interpreted that to mean that hit documents
> should
> > > contain terms from the query, and nothing else. 
> Making
> > > all terms required doesn't do this.
> > >
> > > Steve
> > >
> > > > -Original Message-
> > > > From: Matthew Hall [mailto:mh...@informatics.jax.org]
> > > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: How do I this in Solr?
> > > >
> > > > Um.. you could change your default clause to
> AND
> > > rather than or.
> > > >
> > > > That should do the trick.
> > > >
> > > > Matt
> > > >
> > > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > > Overkill?
> > > > >
> > > > > Dennis Gearon
> > > > >> I can't think of a way to do it
> without
> > > writing new
> > > > >> analysis filters.
> > > > >>
> > > > >> But I think you could do what you
> want with
> > > two filters
> > > > >> (this is untested):
> > > > >>
> > > > >> 1. An index-time filter that
> outputs a single
> > > token
> > > > >> consisting of all of the input
> tokens, sorted
> > > in a
> > > > >> consistent way, e.g.:
> > > > >>
> > > > >>     "mobile with GPS"
> > > ->  "GPS mobile
> > > > >> with"
> > > > >>     "samsung android"
> > > ->  "android
> > > > >> samsung"
> > > > >>
> > > > >> 2. A query-time filter that outputs
> one token
> > > per input
> > > > >> term combination, sorted in the
> same
> > > consistent way as the
> > > > >> index-time filter, e.g.:
> > > > >>
> > > > >>     "samsung andriod
> > > GPS"
> > > > >>       ->
> > > > >> "samsung","android","GPS",
> > > > >>          "android
> > > > >> samsung","GPS samsung","android
> GPS"
> > > > >>          "android
> > > GPS
> > > > >> samsung"
> > > > >>
> > > > >> Steve
> > > > >>
> > > > >>> -Original Message-
> > > > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> > > > >>> Sent: Tuesday, October 26, 2010
> 9:08 AM
> > > > >>> To: solr-user@lucene.apache.org
> > > > >>> Subject: How do I this in
> Solr?
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I have lot of small documents
> (each
> > > containing 1 to 15
> > > > >> words) indexed in
> > > > >>> Solr. For the search query, I
> want the
> > > search results
> > > > >> to contain only
> > > > >>> those
> > > > >>> documents that satisfy this
> criteria "All
> > > of the words
> > > > >> of the search
> > > > >>> result
> > > > >>> document are present in the
> search
> > > query"
> > > > >>>
> > > > >>> For example:
> > > > >>> If I have the following
> documents
> > > indexed: "nokia
> > > > >> n95", "GPS", "android",
> > > > >>> "samsung", "samsung andriod",
> "nokia
> > > andriod", "mobile
> > > > >> with GPS"
> > > > >>> If I search with the text
> "samsung
> > > andriod GPS",
> > > > >> search results should
> > > > >>> only
> > > > >>> conain "samsung", "GPS",
> "andriod" and
> > > "samsung
> > > > >> andriod".
> > > > >>> Is ther

RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Dennis,

Do you mean to say that you read my earlier post, and disagree that it would 
solve the problem?  Or have you simply not read it?

Steve

> -Original Message-
> From: Dennis Gearon [mailto:gear...@sbcglobal.net]
> Sent: Tuesday, October 26, 2010 3:00 PM
> To: solr-user@lucene.apache.org
> Subject: RE: How do I this in Solr?
> 
> Good point. Since I might need such a query myself someday, how *IS* that
> done?
> 
> 
> Dennis Gearon
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Tue, 10/26/10, Steven A Rowe  wrote:
> 
> > From: Steven A Rowe 
> > Subject: RE: How do I this in Solr?
> > To: "solr-user@lucene.apache.org" 
> > Date: Tuesday, October 26, 2010, 11:46 AM
> > Um, maybe I'm way off base, but when
> > Varun said:
> >
> > > If I search with the text "samsung andriod GPS",
> > > search results should only conain "samsung", "GPS",
> > > "andriod" and "samsung andriod".
> >
> > I interpreted that to mean that hit documents should
> > contain terms from the query, and nothing else.  Making
> > all terms required doesn't do this.
> >
> > Steve
> >
> > > -Original Message-
> > > From: Matthew Hall [mailto:mh...@informatics.jax.org]
> > > Sent: Tuesday, October 26, 2010 2:30 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: How do I this in Solr?
> > >
> > > Um.. you could change your default clause to AND
> > rather than or.
> > >
> > > That should do the trick.
> > >
> > > Matt
> > >
> > > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > > Overkill?
> > > >
> > > > Dennis Gearon
> > > >> I can't think of a way to do it without
> > writing new
> > > >> analysis filters.
> > > >>
> > > >> But I think you could do what you want with
> > two filters
> > > >> (this is untested):
> > > >>
> > > >> 1. An index-time filter that outputs a single
> > token
> > > >> consisting of all of the input tokens, sorted
> > in a
> > > >> consistent way, e.g.:
> > > >>
> > > >>     "mobile with GPS"
> > ->  "GPS mobile
> > > >> with"
> > > >>     "samsung android"
> > ->  "android
> > > >> samsung"
> > > >>
> > > >> 2. A query-time filter that outputs one token
> > per input
> > > >> term combination, sorted in the same
> > consistent way as the
> > > >> index-time filter, e.g.:
> > > >>
> > > >>     "samsung andriod
> > GPS"
> > > >>       ->
> > > >> "samsung","android","GPS",
> > > >>          "android
> > > >> samsung","GPS samsung","android GPS"
> > > >>          "android
> > GPS
> > > >> samsung"
> > > >>
> > > >> Steve
> > > >>
> > > >>> -Original Message-
> > > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> > > >>> Sent: Tuesday, October 26, 2010 9:08 AM
> > > >>> To: solr-user@lucene.apache.org
> > > >>> Subject: How do I this in Solr?
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> I have lot of small documents (each
> > containing 1 to 15
> > > >> words) indexed in
> > > >>> Solr. For the search query, I want the
> > search results
> > > >> to contain only
> > > >>> those
> > > >>> documents that satisfy this criteria "All
> > of the words
> > > >> of the search
> > > >>> result
> > > >>> document are present in the search
> > query"
> > > >>>
> > > >>> For example:
> > > >>> If I have the following documents
> > indexed: "nokia
> > > >> n95", "GPS", "android",
> > > >>> "samsung", "samsung andriod", "nokia
> > andriod", "mobile
> > > >> with GPS"
> > > >>> If I search with the text "samsung
> > andriod GPS",
> > > >> search results should
> > > >>> only
> > > >>> conain "samsung", "GPS", "andriod" and
> > "samsung
> > > >> andriod".
> > > >>> Is there a way to do this in Solr.
> > > >>>
> > > >>> --
> > > >>> Thanks
> > > >>> Varun Gupta
> >
> >


RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Good point. Since I might need such a query myself someday, how *IS* that done?


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Tue, 10/26/10, Steven A Rowe  wrote:

> From: Steven A Rowe 
> Subject: RE: How do I this in Solr?
> To: "solr-user@lucene.apache.org" 
> Date: Tuesday, October 26, 2010, 11:46 AM
> Um, maybe I'm way off base, but when
> Varun said:
> 
> > If I search with the text "samsung andriod GPS",
> > search results should only conain "samsung", "GPS",
> > "andriod" and "samsung andriod".
> 
> I interpreted that to mean that hit documents should
> contain terms from the query, and nothing else.  Making
> all terms required doesn't do this.
> 
> Steve
> 
> > -Original Message-
> > From: Matthew Hall [mailto:mh...@informatics.jax.org]
> > Sent: Tuesday, October 26, 2010 2:30 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How do I this in Solr?
> > 
> > Um.. you could change your default clause to AND
> rather than or.
> > 
> > That should do the trick.
> > 
> > Matt
> > 
> > On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > > Overkill?
> > >
> > > Dennis Gearon
> > >> I can't think of a way to do it without
> writing new
> > >> analysis filters.
> > >>
> > >> But I think you could do what you want with
> two filters
> > >> (this is untested):
> > >>
> > >> 1. An index-time filter that outputs a single
> token
> > >> consisting of all of the input tokens, sorted
> in a
> > >> consistent way, e.g.:
> > >>
> > >>     "mobile with GPS"
> ->  "GPS mobile
> > >> with"
> > >>     "samsung android"
> ->  "android
> > >> samsung"
> > >>
> > >> 2. A query-time filter that outputs one token
> per input
> > >> term combination, sorted in the same
> consistent way as the
> > >> index-time filter, e.g.:
> > >>
> > >>     "samsung andriod
> GPS"
> > >>       ->
> > >> "samsung","android","GPS",
> > >>          "android
> > >> samsung","GPS samsung","android GPS"
> > >>          "android
> GPS
> > >> samsung"
> > >>
> > >> Steve
> > >>
> > >>> -Original Message-
> > >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> > >>> Sent: Tuesday, October 26, 2010 9:08 AM
> > >>> To: solr-user@lucene.apache.org
> > >>> Subject: How do I this in Solr?
> > >>>
> > >>> Hi,
> > >>>
> > >>> I have lot of small documents (each
> containing 1 to 15
> > >> words) indexed in
> > >>> Solr. For the search query, I want the
> search results
> > >> to contain only
> > >>> those
> > >>> documents that satisfy this criteria "All
> of the words
> > >> of the search
> > >>> result
> > >>> document are present in the search
> query"
> > >>>
> > >>> For example:
> > >>> If I have the following documents
> indexed: "nokia
> > >> n95", "GPS", "android",
> > >>> "samsung", "samsung andriod", "nokia
> andriod", "mobile
> > >> with GPS"
> > >>> If I search with the text "samsung
> andriod GPS",
> > >> search results should
> > >>> only
> > >>> conain "samsung", "GPS", "andriod" and
> "samsung
> > >> andriod".
> > >>> Is there a way to do this in Solr.
> > >>>
> > >>> --
> > >>> Thanks
> > >>> Varun Gupta
> 
>


Re: ClassCastException Issue

2010-10-26 Thread Chris Hostetter

: [ERROR][http-4443-exec-3][util.plugin.AbstractPluginLoader] log():139
: java.lang.ClassCastException: org.apache.solr.schema.StrField cannot
: be cast to org.apache.solr.schema.FieldType

This almost certainly inidcates a classloader issue - i suspect you have 
multiple solr related jars in various places, and the FieldType class 
instance found when StrField is loaded comes from a different 
(incompatible) jar.


-Hoss


RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Um, maybe I'm way off base, but when Varun said:

> If I search with the text "samsung andriod GPS",
> search results should only conain "samsung", "GPS",
> "andriod" and "samsung andriod".

I interpreted that to mean that hit documents should contain terms from the 
query, and nothing else.  Making all terms required doesn't do this.

Steve

> -Original Message-
> From: Matthew Hall [mailto:mh...@informatics.jax.org]
> Sent: Tuesday, October 26, 2010 2:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How do I this in Solr?
> 
> Um.. you could change your default clause to AND rather than or.
> 
> That should do the trick.
> 
> Matt
> 
> On 10/26/2010 2:26 PM, Dennis Gearon wrote:
> > Overkill?
> >
> > Dennis Gearon
> >> I can't think of a way to do it without writing new
> >> analysis filters.
> >>
> >> But I think you could do what you want with two filters
> >> (this is untested):
> >>
> >> 1. An index-time filter that outputs a single token
> >> consisting of all of the input tokens, sorted in a
> >> consistent way, e.g.:
> >>
> >> "mobile with GPS" ->  "GPS mobile
> >> with"
> >> "samsung android" ->  "android
> >> samsung"
> >>
> >> 2. A query-time filter that outputs one token per input
> >> term combination, sorted in the same consistent way as the
> >> index-time filter, e.g.:
> >>
> >> "samsung andriod GPS"
> >>   ->
> >> "samsung","android","GPS",
> >>  "android
> >> samsung","GPS samsung","android GPS"
> >>  "android GPS
> >> samsung"
> >>
> >> Steve
> >>
> >>> -Original Message-
> >>> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> >>> Sent: Tuesday, October 26, 2010 9:08 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: How do I this in Solr?
> >>>
> >>> Hi,
> >>>
> >>> I have lot of small documents (each containing 1 to 15
> >> words) indexed in
> >>> Solr. For the search query, I want the search results
> >> to contain only
> >>> those
> >>> documents that satisfy this criteria "All of the words
> >> of the search
> >>> result
> >>> document are present in the search query"
> >>>
> >>> For example:
> >>> If I have the following documents indexed: "nokia
> >> n95", "GPS", "android",
> >>> "samsung", "samsung andriod", "nokia andriod", "mobile
> >> with GPS"
> >>> If I search with the text "samsung andriod GPS",
> >> search results should
> >>> only
> >>> conain "samsung", "GPS", "andriod" and "samsung
> >> andriod".
> >>> Is there a way to do this in Solr.
> >>>
> >>> --
> >>> Thanks
> >>> Varun Gupta



Re: Highlighting for non-stored fields

2010-10-26 Thread Pradeep Singh
Another way you can do this is - after the search has completed, load the
field in your application, write separate code to reanalyze that
field/document, index it in RAM, and run it through highlighter classes. All
this as part of your web application outside of Solr. Considering the size
of your data it doesn't look advisable to store it because then you would be
almost doubling the size of your index (if you are looking to highlight on a
field then it's probably going to be full of content).

-Pradeep

On Tue, Oct 26, 2010 at 8:32 AM, Phong Dais  wrote:

> Hi,
>
> I understand that I need to store the fields in order to use highlighting
> "out of the box".
> I'm looking for a way to highlighting using term offsets instead of the
> actual text since the text is not stored.  What am asking is is it possible
> to modify the response (thru custom implementation) to contain highlighted
> offsets instead of the actual matched text.  Should I be writing my own
> DefaultHighlighter?  Or overiding some of its functionality?  Can this be
> done this way or am I way off?
>
> BTW, I'm using solr-1.4.
>
> Thanks,
> P.
>
> On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo  wrote:
>
> > Check out this link
> >
> > http://wiki.apache.org/solr/FieldOptionsByUseCase
> >
> > You need to store the field if you want to use the highlighting feature.
> >
> > If you need to retrieve and display the highlighted snippets then the
> > fields
> > definitely needs to be stored.
> >
> > To use term offsets, it will be a good idea to enable the following
> > attributes for that field  termVectors termPositions termOffsets
> >
> > The only issue here is that your storage costs will increase because of
> > these extra features.
> >
> > Nevertheless, you definitely need to store the field if you need to
> > retrieve
> > it for highlighting purposes.
> >
> > On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais 
> wrote:
> >
> > > Hi,
> > >
> > > I've been looking thru the mailing archive for the past week and I
> > haven't
> > > found any useful info regarding this issue.
> > >
> > > My requirement is to index a few terabytes worth of data to be
> searched.
> > > Due to the size of the data, I would like to index without storing but
> I
> > > would like to use the highlighting feature.  Is this even possible?
>  What
> > > are my options?
> > >
> > > I've read about termOffsets, payload that could possibly be used to do
> > this
> > > but I have no idea how this could be done.
> > >
> > > Any pointers greatly appreciated.  Someone please point me in the right
> > > direction.
> > >
> > >  I don't mind having to write some code or digging thru existing code
> to
> > > accomplish this task.
> > >
> > > Thanks,
> > > P.
> > >
> >
> >
> >
> > --
> > °O°
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>


Re: How do I this in Solr?

2010-10-26 Thread Matthew Hall

Um.. you could change your default clause to AND rather than or.

That should do the trick.

Matt

On 10/26/2010 2:26 PM, Dennis Gearon wrote:

Overkill?

Dennis Gearon

I can't think of a way to do it without writing new
analysis filters.

But I think you could do what you want with two filters
(this is untested):

1. An index-time filter that outputs a single token
consisting of all of the input tokens, sorted in a
consistent way, e.g.:

"mobile with GPS" ->  "GPS mobile
with"
"samsung android" ->  "android
samsung"

2. A query-time filter that outputs one token per input
term combination, sorted in the same consistent way as the
index-time filter, e.g.:

"samsung andriod GPS"
  ->
"samsung","android","GPS",

 "android
samsung","GPS samsung","android GPS"
 "android GPS
samsung"

Steve


-Original Message-
From: Varun Gupta [mailto:varun.vgu...@gmail.com]
Sent: Tuesday, October 26, 2010 9:08 AM
To: solr-user@lucene.apache.org
Subject: How do I this in Solr?

Hi,

I have lot of small documents (each containing 1 to 15

words) indexed in

Solr. For the search query, I want the search results

to contain only

those
documents that satisfy this criteria "All of the words

of the search

result
document are present in the search query"

For example:
If I have the following documents indexed: "nokia

n95", "GPS", "android",

"samsung", "samsung andriod", "nokia andriod", "mobile

with GPS"

If I search with the text "samsung andriod GPS",

search results should

only
conain "samsung", "GPS", "andriod" and "samsung

andriod".

Is there a way to do this in Solr.

--
Thanks
Varun Gupta




Re: Modelling Access Control

2010-10-26 Thread Dennis Gearon
"Son, don't touch that stove . . . .",

"OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me 
that?!?#! You know I need to know WHY, not just DON'T!"

Dennis Gearon

> Very important: do not make a spelling or autosuggest index
> from a
> text field which some people can see and other people
> can't.
> 



RE: How do I this in Solr?

2010-10-26 Thread Dennis Gearon
Overkill?

Dennis Gearon
> 
> I can't think of a way to do it without writing new
> analysis filters.
> 
> But I think you could do what you want with two filters
> (this is untested):
> 
> 1. An index-time filter that outputs a single token
> consisting of all of the input tokens, sorted in a
> consistent way, e.g.:
> 
>    "mobile with GPS" -> "GPS mobile
> with"
>    "samsung android" -> "android
> samsung"
> 
> 2. A query-time filter that outputs one token per input
> term combination, sorted in the same consistent way as the
> index-time filter, e.g.:
> 
>    "samsung andriod GPS"
>  ->   
> "samsung","android","GPS",
>         "android
> samsung","GPS samsung","android GPS"
>         "android GPS
> samsung"
> 
> Steve
> 
> > -Original Message-
> > From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> > Sent: Tuesday, October 26, 2010 9:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: How do I this in Solr?
> > 
> > Hi,
> > 
> > I have lot of small documents (each containing 1 to 15
> words) indexed in
> > Solr. For the search query, I want the search results
> to contain only
> > those
> > documents that satisfy this criteria "All of the words
> of the search
> > result
> > document are present in the search query"
> > 
> > For example:
> > If I have the following documents indexed: "nokia
> n95", "GPS", "android",
> > "samsung", "samsung andriod", "nokia andriod", "mobile
> with GPS"
> > 
> > If I search with the text "samsung andriod GPS",
> search results should
> > only
> > conain "samsung", "GPS", "andriod" and "samsung
> andriod".
> > 
> > Is there a way to do this in Solr.
> > 
> > --
> > Thanks
> > Varun Gupta
>


Re: Strange search

2010-10-26 Thread ramzesua

Can anyone tell my, why my search is so terrible? It's work realy strange.
Here my basic configs in schema.xml:
main filters:

  





  
  





  



and fields:


   
   
   
   
   
   
   
   
   

   
  

   
   
   
   

templateId

 text

 









here schema for field "typeCaption" from
_http://localhost:8983/search/admin/schema.jsp;
html4
page4
template4
text4
main4
seo 3
meta2
tags1
keywords1

If I search "html", I get all results, but if I search "seo" or "text" I
don't get any results. I try to use wildcard, but it don't help me. Can
anyone say, where is my problem. Sorry for my not well english.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1773307.html
Sent from the Solr - User mailing list archive at Nabble.com.


After java replication: field not found exception on slaves

2010-10-26 Thread Peter Karich

Hi,

we had the following problem. We added a field to schema.xml and fed our 
master with the new data.
After that querying on the master is fine. But when we replicated 
(solr1.4.0) to our slaves.
All slaves said they cannot find the new field (standard exception for 
missing fields).
And that although I can see the new field in the xml response and I can 
see it in the replicated schema.xml file!?


It is more strange that with scp-ing the exact data folder to our master 
all is fine (on the master).


Did somebody of you hit the same strange behaviour?

Regards,
Peter.


PS: Finally  we did on the slaves:
rm -rf data/
./reload.sh + replicated again


Inconsistent slave performance after optimize

2010-10-26 Thread Mason Hale
Hello esteemed Solr community --

I'm observing some inconsistent performance on our slave servers after
recently optimizing our master server.

Our configuration is as follows:

- all servers are hosted at Amazon EC2, running Ubuntu 8.04
- 1 master with heavy insert/update traffic, about 125K new documents
per day (m1.large, ~8GB RAM)
   - autocommit every 1 minute
- 3 slaves (m2.xlarge instance sizes, ~16GB RAM)
   - replicate every 5 minutes
   - we have configured autowarming queries for these machines
   - autowarmCount = 0
- Total index size is ~7M documents

We were seeing increasing, but gradual performance degradation across all
nodes.
So we decided to try optimizing our index to improve performance.

In preparation for the optimize we disabled replication polling on all
slaves. We also turned off all
workers that were writing to the index. Then we ran optimize on the master.

The optimize took 45-60 minutes to complete, and the total size went from
68GB down to 23GB.

We then enabled replication on each slave one at a time.

The first slave we re-enabled took about 15 minutes to copy the new files.
Once the files were copied
the performance of slave plummeted. Average response time went from 0.75 sec
to 45 seconds.
Over the past 18 hours the average response time has gradually gown down to
around 1.2 seconds now.

Before re-enabling replication the second slave, we first removed it from
our load-balanced pool of available search servers.
This server's average query performance also degraded quickly, and then
(unlike the first slave we replicated) did not improve.
It stayed at around 30 secs per query. On the theory that this is a
cache-warming issue, we added this server
back to the pool in hopes that additional traffic would warm the cache. But
what we saw was a quick spike of much worse
performance (50 sec / query on average) followed by a slow/gradual decline
in average response times.
As of now (10 hours after the initial replication) this server is still
reporting an average response time of ~2 seconds.
This is much worse than before the optimize and is a counter-intuitive
result. We expected an index 1/3 the size would be faster, not slower.

On the theory that the index files needed to be loaded into the file system
cache, I used the 'dd' command to copy
the contents of the data/index directory to /dev/null, but that did not
result in any noticeable performance improvement.

At this point, things were not going as expected. We did not expect the
replication after an optimize to result in such horrid
performance. So we decided to let the last slave continue to serve stale
results while we waited 4 hours for the
other two slaves to approach some acceptable performance level.

After the 4 hour break, we re-moved the 3rd and last slave server from our
load-balancing pool, then re-enabled replication.
This time we saw a tiny blip. The average performance went up to 1 second
briefly then went back to the (normal for us)
0.25 to 0.5 second range. We then added this server back to the
load-balancing pool and observed no degradation in performance.

While we were happy to avoid a repeat of the poor performance we saw on the
previous slaves, we are at a loss to explain
why this slave did not also have such poor performance.

At this point we're scratching our heads trying to understand:
   (a) Why the performance of the first two slaves was so terrible after the
optimize. We think its cache-warming related, but we're not sure.
 > 10 hours seems like a long time to wait for the cache to warm up
   (b) Why the performance of the third slave was barely impacted. It should
have hit the same cold-cache issues as the other servers, if that is indeed
the root cause.
   (c) Why performance of the first 2 slaves is still much worse after the
optimize than it was before the optimize,
  where the performance of the 3rd slave is pretty much unchanged. We
expected the optimize to *improve* performance.

All 3 slave servers are identically configured, and the procedure for
re-enabling replication was identical for the 2nd and 3rd
slaves, with the exception of a 4-hour wait period.

We have confirmed that the 3rd slave did replicate, the number of documents
and total index size matches the master and other slave servers.

I'm writing to fish for an explanation or ideas that might explain this
inconsistent performance. Obviously, we'd like to be able to reproduce the
performance of the 3rd slave, and avoid the poor performance of the first
two slaves the next time we decide it's time to optimize our index.

thanks in advance,

Mason


Re: Highlighting for non-stored fields

2010-10-26 Thread Phong Dais
Hi,

I understand that I need to store the fields in order to use highlighting
"out of the box".
I'm looking for a way to highlighting using term offsets instead of the
actual text since the text is not stored.  What am asking is is it possible
to modify the response (thru custom implementation) to contain highlighted
offsets instead of the actual matched text.  Should I be writing my own
DefaultHighlighter?  Or overiding some of its functionality?  Can this be
done this way or am I way off?

BTW, I'm using solr-1.4.

Thanks,
P.

On Tue, Oct 26, 2010 at 9:25 AM, Israel Ekpo  wrote:

> Check out this link
>
> http://wiki.apache.org/solr/FieldOptionsByUseCase
>
> You need to store the field if you want to use the highlighting feature.
>
> If you need to retrieve and display the highlighted snippets then the
> fields
> definitely needs to be stored.
>
> To use term offsets, it will be a good idea to enable the following
> attributes for that field  termVectors termPositions termOffsets
>
> The only issue here is that your storage costs will increase because of
> these extra features.
>
> Nevertheless, you definitely need to store the field if you need to
> retrieve
> it for highlighting purposes.
>
> On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais  wrote:
>
> > Hi,
> >
> > I've been looking thru the mailing archive for the past week and I
> haven't
> > found any useful info regarding this issue.
> >
> > My requirement is to index a few terabytes worth of data to be searched.
> > Due to the size of the data, I would like to index without storing but I
> > would like to use the highlighting feature.  Is this even possible?  What
> > are my options?
> >
> > I've read about termOffsets, payload that could possibly be used to do
> this
> > but I have no idea how this could be done.
> >
> > Any pointers greatly appreciated.  Someone please point me in the right
> > direction.
> >
> >  I don't mind having to write some code or digging thru existing code to
> > accomplish this task.
> >
> > Thanks,
> > P.
> >
>
>
>
> --
> °O°
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>


Re: Documents are deleted when Solr is restarted

2010-10-26 Thread Israel Ekpo
The Solr home is the -Dsolr.solr.home Java System property

Also make sure that -Dsolr.data.dir is define for your data directory, if it
is not already defined in the solrconfig.xml file

On Tue, Oct 26, 2010 at 10:46 AM, Upayavira  wrote:

> You need to watch what you are setting your solr.home to. That is where
> your indexes are being written. Are they getting overwritten/lost
> somehow. Watch the files in that dir while doing a restart.
>
> That's a start at least.
>
> Upayavira
>
> On Tue, 26 Oct 2010 16:40 +0300, "Mackram Raydan" 
> wrote:
> > Hey everyone,
> >
> > I apologize if this question is rudimentary but it is getting to me and
> > I did not find anything reasonable about it online.
> >
> > So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the
> > SolrTomcat wiki page to setup. The system works exactly the way I want
> > it (proper search, highlighting, etc...). The problem however is when I
> > restart my Tomcat server all the data in Solr (ie the index) is simply
> > lost. The admin shows me the number of docs is 0 when it was before in
> > the thousands.
> >
> > Can someone please help me understand why the above is happening and how
> > can I workaround it if possible?
> >
> > Big thanks for any help you can send my way.
> >
> > Regards,
> >
> > Mackram
> >
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Documents are deleted when Solr is restarted

2010-10-26 Thread Upayavira
You need to watch what you are setting your solr.home to. That is where
your indexes are being written. Are they getting overwritten/lost
somehow. Watch the files in that dir while doing a restart.

That's a start at least.

Upayavira

On Tue, 26 Oct 2010 16:40 +0300, "Mackram Raydan" 
wrote:
> Hey everyone,
> 
> I apologize if this question is rudimentary but it is getting to me and 
> I did not find anything reasonable about it online.
> 
> So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the 
> SolrTomcat wiki page to setup. The system works exactly the way I want 
> it (proper search, highlighting, etc...). The problem however is when I 
> restart my Tomcat server all the data in Solr (ie the index) is simply 
> lost. The admin shows me the number of docs is 0 when it was before in 
> the thousands.
> 
> Can someone please help me understand why the above is happening and how 
> can I workaround it if possible?
> 
> Big thanks for any help you can send my way.
> 
> Regards,
> 
> Mackram
> 


Re: Query only a specfic field with a specific value using Dismax Handler

2010-10-26 Thread Swapnonil Mukherjee
Thanks Jonathan. FQ seems promising. I will give it a go.

Swapnonil Mukherjee




On 26-Oct-2010, at 7:29 PM, Jonathan Rochkind wrote:

> So, first of all, "exact" match is hard in Solr on tokenized fields.  
> Tokenized fields don't really do that.  So for exact match, you should 
> probably use a non-tokenized field (string or text with keywordtokenizer 
> (which should really be called the non-tokenizer)). If there's only one 
> token in your value anyway though, like a single number, it may not 
> matter and work fine.
> 
> Secondly, I'd recommend combining a dismax query for the user-entered 
> phrase (like 'dog') with standard lucene queries for those other 
> things.  There are (at least) two ways to do that. The first is just put 
> everything after the first AND in one or more 'fq' parameters instead of 
> trying to include them in 'q'.  The second is to use Solr's nested query 
> syntax, to specify sub-queries with different query parsers. Someone can 
> explain the second if you need it, but the easier to understand 'fq' 
> approach seems right to me for your case.
> 
> Swapnonil Mukherjee wrote:
>> Hi Everybody,
>> 
>> Let me give you a brief idea of our Solr document. We have about 6 text type 
>> fields, each containing IPTC data extracted from photos. Search is performed 
>> mostly on these 6 fields.
>> We also have a mutlivalue field named group_id that contains a list of all 
>> the  group_ids that have access to this photo.  In other words we are 
>> storing the metadata of the photo as well as the permissions applicable for 
>> this photo in the Solr document itself. This group_id field by the way is of 
>> long type.
>> 
>> Additionally we have certain boolean and constant type fields named 
>> visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).
>> 
>> The first field defaultSearch is a copyField which contains a copy of all 
>> the values of 6 text type fields that I have mentioned.
>> 
>> The way we query presently using the default search handler is like this.
>> 
>> defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
>> group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
>> (visibleToEndUser:true)
>> 
>> We want to start using the dismax (if not dismax then edismax)  query 
>> handler but so far I have not been able to replicate the query mentioned 
>> above to the equivalent dismax form.
>> 
>> What I cannot figure out is?
>> 
>> 1. How do I apply exact match on the group_id, visibleToEndUser and the 
>> entityType fields? Or How how do I query a specific field with a specific 
>> value rather than searching across all fields with all values.
>> 2. How do I apply OR and AND conditions?
>> 
>> 
>> Swapnonil Mukherjee
>> 
>> 
>> 
>> 
>> 



Re: how well does multicore scale?

2010-10-26 Thread Jonathan Rochkind

mike anderson wrote:

I'm really curious if there is a clever solution to the obvious problem
with: "So your better off using a single index and with a user id and use
a query filter with the user id when fetching data.", i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..
  
Actually, I think that design would scale pretty fine, I don't think 
there's an 'obvious' problem. You store your userIDs in a multi-valued 
field (or as multiple terms in a single value, ends up being similar). 
You fq on there with the current userID.   There's one way to find out 
of course, but that doesn't seem a patently ridiculous scenario or 
anything, that's the kind of thing Solr is generally good at, it's what 
it's built for.   The problem might actually be in the time it takes to 
add such a document to the index; but not in query time.


Doesn't mean it's the best solution for your problem though, I can't say.

My impression is that Solr in general isn't really designed to support 
the kind of multi-tenancy use case people are talking about lately.  So 
trying to make it work anyway... if multi-cores work for you, then 
great, but be aware they weren't really designed for that (having 
thousands of cores) and may not. If a single index can work for you 
instead, great, but as you've discovered it's not neccesarily obvious 
how to set up the schema to do what you need -- really this applies to 
Solr in general, unlike an rdbms where you just third-form-normalize 
everything and figure it'll work for almost any use case that comes up,  
in Solr you generally need to custom fit the schema for your particular 
use cases, sometimes being kind of clever to figure out the optimal way 
to do that.


This is, I'd argue/agree, indeed kind of a disadvantage, setting up a 
Solr index takes more intellectual work than setting up an rdbms. The 
trade off is you get speed, and flexible ways to set up relevancy (that 
still perform well). Took a couple decades for rdbms to get as brainless 
to use as they are, maybe in a couple more we'll have figured out ways 
to make indexing engines like solr equally brainless, but not yet -- but 
it's still pretty damn easy for what it is, the lucene/Solr folks have 
done a remarkable job.


Re: How do I this in Solr?

2010-10-26 Thread Ken Stanley
On Tue, Oct 26, 2010 at 9:15 AM, Savvas-Andreas Moysidis <
savvas.andreas.moysi...@googlemail.com> wrote:

> If I get your question right, you probably want to use the AND binary
> operator as in "samsung AND andriod AND GPS" or "+samsung +andriod +GPS"
>
>
N.b. For these queries you can also pass the q.op parameter in the request
to temporarily change the default operator to AND; this has the same effect
without having to build the query; i.e., you can just pass
"http://host:port/solr/select?q=samsung+android+gps&q.op=and";
as the query string (along with any other params you need).


Re: how well does multicore scale?

2010-10-26 Thread mike anderson
So I fired up about 100 cores and used JMeter to fire off a few thousand
queries. It looks like the memory usage isn't much worse than running a
single shard. So thats good.

I'm really curious if there is a clever solution to the obvious problem
with: "So your better off using a single index and with a user id and use
a query filter with the user id when fetching data.", i.e.. when you have
hundreds of thousands of user IDs tagged on each article. That just doesn't
sound like it scales very well..


Cheers,
Mike


On Fri, Oct 22, 2010 at 10:43 PM, Lance Norskog  wrote:

> http://wiki.apache.org/solr/CoreAdmin
>
> Since Solr 1.3
>
> On Fri, Oct 22, 2010 at 1:40 PM, mike anderson 
> wrote:
> > Thanks for the advice, everyone. I'll take a look at the API mentioned
> and
> > do some benchmarking over the weekend.
> >
> > -Mike
> >
> >
> > On Fri, Oct 22, 2010 at 8:50 AM, Mark Miller 
> wrote:
> >
> >> On 10/22/10 1:44 AM, Tharindu Mathew wrote:
> >> > Hi Mike,
> >> >
> >> > I've also considered using a separate cores in a multi tenant
> >> > application, ie a separate core for each tenant/domain. But the cores
> >> > do not suit that purpose.
> >> >
> >> > If you check out documentation no real API support exists for this so
> >> > it can be done dynamically through SolrJ. And all use cases I found,
> >> > only had users configuring it statically and then using it. That was
> >> > maybe 2 or 3 cores. Please correct me if I'm wrong Solr folks.
> >>
> >> You can dynamically manage cores with solrj. See
> >> org.apache.solr.client.solrj.request.CoreAdminRequest's static methods
> >> for a place to start.
> >>
> >> You probably want to turn solr.xml's persist option on so that your
> >> cores survive restarts.
> >>
> >> >
> >> > So your better off using a single index and with a user id and use a
> >> > query filter with the user id when fetching data.
> >>
> >> Many times this is probably the case - pro's and con's to each depending
> >> on what you are up to.
> >>
> >> - Mark
> >> lucidimagination.com
> >>
> >> >
> >> > On Fri, Oct 22, 2010 at 1:12 AM, Jonathan Rochkind 
> >> wrote:
> >> >> No, it does not seem reasonable.  Why do you think you need a
> seperate
> >> core
> >> >> for every user?
> >> >> mike anderson wrote:
> >> >>>
> >> >>> I'm exploring the possibility of using cores as a solution to
> "bookmark
> >> >>> folders" in my solr application. This would mean I'll need tens of
> >> >>> thousands
> >> >>> of cores... does this seem reasonable? I have plenty of CPUs
> available
> >> for
> >> >>> scaling, but I wonder about the memory overhead of adding cores
> (aside
> >> >>> from
> >> >>> needing to fit the new index in memory).
> >> >>>
> >> >>> Thoughts?
> >> >>>
> >> >>> -mike
> >> >>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >>
> >>
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


Re: Query only a specfic field with a specific value using Dismax Handler

2010-10-26 Thread Jonathan Rochkind
So, first of all, "exact" match is hard in Solr on tokenized fields.  
Tokenized fields don't really do that.  So for exact match, you should 
probably use a non-tokenized field (string or text with keywordtokenizer 
(which should really be called the non-tokenizer)). If there's only one 
token in your value anyway though, like a single number, it may not 
matter and work fine.


Secondly, I'd recommend combining a dismax query for the user-entered 
phrase (like 'dog') with standard lucene queries for those other 
things.  There are (at least) two ways to do that. The first is just put 
everything after the first AND in one or more 'fq' parameters instead of 
trying to include them in 'q'.  The second is to use Solr's nested query 
syntax, to specify sub-queries with different query parsers. Someone can 
explain the second if you need it, but the easier to understand 'fq' 
approach seems right to me for your case.


Swapnonil Mukherjee wrote:

Hi Everybody,

Let me give you a brief idea of our Solr document. We have about 6 text type 
fields, each containing IPTC data extracted from photos. Search is performed 
mostly on these 6 fields.
We also have a mutlivalue field named group_id that contains a list of all the  
group_ids that have access to this photo.  In other words we are storing the 
metadata of the photo as well as the permissions applicable for this photo in 
the Solr document itself. This group_id field by the way is of long type.

Additionally we have certain boolean and constant type fields named 
visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).

The first field defaultSearch is a copyField which contains a copy of all the 
values of 6 text type fields that I have mentioned.

The way we query presently using the default search handler is like this.

defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
(visibleToEndUser:true)

We want to start using the dismax (if not dismax then edismax)  query handler 
but so far I have not been able to replicate the query mentioned above to the 
equivalent dismax form.

What I cannot figure out is?

1. How do I apply exact match on the group_id, visibleToEndUser and the 
entityType fields? Or How how do I query a specific field with a specific value 
rather than searching across all fields with all values.
2. How do I apply OR and AND conditions?


Swapnonil Mukherjee




  


Re: Solr ExtractingRequestHandler with Compressed files

2010-10-26 Thread Joey Hanzel
Hi Javendra,

Thanks for the suggestion, I updated to Solr 1.4.1 and Solr Cell 1.4.1 and
tried sending a zip file that contained several html documents.
Unfortunately, that did not solve the problem.

Here's the curl command I used:
curl "
http://localhost:8983/solr/update/extract?literla.id=d...@uprefix=attr_&fmap.content=attri_content&commit=true";
-F "file=data.zip"

When I query for id:doc1, the attr_content lists each filename within the
zip archive. It also indexed the stream_size, stream_source and
content_type.  It does not appear to be opening up the individual files
within the zip.

Did you have to make any other configuration changes to your solrconfig.xml
or schema.xml to read the contents of the individual files?  Would it help
to pass the specific mime type on the curl line ?

On Mon, Oct 25, 2010 at 3:27 PM, Jayendra Patil <
jayendra.patil@gmail.com> wrote:

> There was this issue with the previous version of Solr, wherein only the
> file names from the zip used to get indexed.
> We had faced the same issue and ended up using the Solr trunk which has the
> Tika version upgraded and works fine.
>
> The Solr version 1.4.1 should also have the fix included. Try using it.
>
> Regards,
> Jayendra
>
> On Fri, Oct 22, 2010 at 6:02 PM, Joey Hanzel  >wrote:
>
> > Hi,
> >
> > Has anyone had success using ExtractingRequestHandler and Tika with any
> of
> > the compressed file formats (zip, tar, gz, etc) ?
> >
> > I am sending solr the archived.tar file using curl. curl "
> >
> >
> http://localhost:8983/solr/update/extract?literal.id=doc1&fmap.content=body_texts&commit=true
> > "
> > -H 'Content-type:application/octet-stream' --data-binary
> > "@/home/archived.tar"
> > The result I get when I query the document is that the filenames inside
> the
> > archive are indexed as the "body_texts", but the content of those files
> is
> > not extracted or included.  This is not the behvior I expected. Ref:
> >
> >
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#article.tika.example
> > .
> > When I send 1 of the actual documents inside the archive using the same
> > curl
> > command the extracted content is then stored in the "body_texts" field.
>  Am
> > I missing a step for the compressed files?
> >
> > I have added all the extraction depednenices as indicated by mat in
> > http://outoftime.lighthouseapp.com/projects/20339/tickets/98-solr-celland
> > am able to succesfully extract data from MS Word, PDF, HTML documents.
> >
> > I'm using the following library versions.
> >  Solr 1.40,  Solr Cell 1.4.1, with Tika Core 0.4
> >
> > Given everything I have read this version of Tika should support
> extracting
> > data from all files within a compressed file.  Any help or suggestions
> > would
> > be appreciated.
> >
>


Solr - xmlhttprequest

2010-10-26 Thread Yavuz Selim YILMAZ
I have a solr instance in my server, and I can make request with internet
explorer. However, with other browsers I can't.

Error given;
*XMLHttpRequest cannot load http://. Origin http://... is not allowed by
Access-Control-Allow-Origin.*

I changed my apache server conf file and added this lines;

Header set Access-Control-Allow-Origin "*"
Header set Access-Control-Allow-Methods POST,GET,OPTIONS
Header set Access-Control-Allow-Headers X-PINGOTHER
Header set Access-Control-Max-Age 1728000

to allow.

Still, the same error.

Any suggestion?
--

Yavuz Selim YILMAZ


Re: a bug of solr distributed search

2010-10-26 Thread Ron Mayer
Andrzej Bialecki wrote:
> On 2010-10-25 11:22, Toke Eskildsen wrote:
>> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: 
>>> But itshows a problem of distrubted search without common idf.
>>> A doc will get different score in different shard.
>> Bingo.
>>
>> I really don't understand why this fundamental problem with sharding
>> isn't mentioned more often. Every time the advice "use sharding" is
>> given, it should be followed with a "but be aware that it will make
>> relevance ranking unreliable".
> 
> The reason is twofold, I think:


And a third potential reason - it's arguably a feature instead of a bug
for some applications.  Depending on how I organize my shards, "give me
the most relevant document from each shard for this search" seems like
it could be useful.

> * there is an exact solution to this problem, namely to make two
> distributed calls instead of one (first call to collect per-shard IDFs
> for given query terms, second call to submit a query rewritten with the
> global IDF-s). This solution is implemented in SOLR-1632, with some
> caching to reduce the cost for common queries. However, this means that
> now for every query you need to make two calls instead of one, which
> potentially doubles the time to return results (for simple common
> queries - for rare complex queries the time will be still dominated by
> the query runtime on shard servers).
> 
> * another reason is that in many many cases the difference between using
> exact global IDF and per-shard IDFs is not that significant. If shards
> are more or less homogenous (e.g. you assign documents to shards by
> hash(docId)) then term distributions will be also similar. So then the
> question is whether you can accept an N% variance in scores across
> shards, or whether you want to bear the cost of an additional
> distributed RPC for every query...
> 
> To summarize, I would qualify your statement with: "...if the
> composition of your shards is drastically different". Otherwise the cost
> of using global IDF is not worth it, IMHO.
> 



Documents are deleted when Solr is restarted

2010-10-26 Thread Mackram Raydan

Hey everyone,

I apologize if this question is rudimentary but it is getting to me and 
I did not find anything reasonable about it online.


So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the 
SolrTomcat wiki page to setup. The system works exactly the way I want 
it (proper search, highlighting, etc...). The problem however is when I 
restart my Tomcat server all the data in Solr (ie the index) is simply 
lost. The admin shows me the number of docs is 0 when it was before in 
the thousands.


Can someone please help me understand why the above is happening and how 
can I workaround it if possible?


Big thanks for any help you can send my way.

Regards,

Mackram


Re: Highlighting for non-stored fields

2010-10-26 Thread Israel Ekpo
Check out this link

http://wiki.apache.org/solr/FieldOptionsByUseCase

You need to store the field if you want to use the highlighting feature.

If you need to retrieve and display the highlighted snippets then the fields
definitely needs to be stored.

To use term offsets, it will be a good idea to enable the following
attributes for that field  termVectors termPositions termOffsets

The only issue here is that your storage costs will increase because of
these extra features.

Nevertheless, you definitely need to store the field if you need to retrieve
it for highlighting purposes.

On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais  wrote:

> Hi,
>
> I've been looking thru the mailing archive for the past week and I haven't
> found any useful info regarding this issue.
>
> My requirement is to index a few terabytes worth of data to be searched.
> Due to the size of the data, I would like to index without storing but I
> would like to use the highlighting feature.  Is this even possible?  What
> are my options?
>
> I've read about termOffsets, payload that could possibly be used to do this
> but I have no idea how this could be done.
>
> Any pointers greatly appreciated.  Someone please point me in the right
> direction.
>
>  I don't mind having to write some code or digging thru existing code to
> accomplish this task.
>
> Thanks,
> P.
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


RE: How do I this in Solr?

2010-10-26 Thread Steven A Rowe
Hi Varun,

I can't think of a way to do it without writing new analysis filters.

But I think you could do what you want with two filters (this is untested):

1. An index-time filter that outputs a single token consisting of all of the 
input tokens, sorted in a consistent way, e.g.:

   "mobile with GPS" -> "GPS mobile with"
   "samsung android" -> "android samsung"

2. A query-time filter that outputs one token per input term combination, 
sorted in the same consistent way as the index-time filter, e.g.:

   "samsung andriod GPS"
 -> "samsung","android","GPS",
"android samsung","GPS samsung","android GPS"
"android GPS samsung"

Steve

> -Original Message-
> From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> Sent: Tuesday, October 26, 2010 9:08 AM
> To: solr-user@lucene.apache.org
> Subject: How do I this in Solr?
> 
> Hi,
> 
> I have lot of small documents (each containing 1 to 15 words) indexed in
> Solr. For the search query, I want the search results to contain only
> those
> documents that satisfy this criteria "All of the words of the search
> result
> document are present in the search query"
> 
> For example:
> If I have the following documents indexed: "nokia n95", "GPS", "android",
> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
> 
> If I search with the text "samsung andriod GPS", search results should
> only
> conain "samsung", "GPS", "andriod" and "samsung andriod".
> 
> Is there a way to do this in Solr.
> 
> --
> Thanks
> Varun Gupta


Re: How do I this in Solr?

2010-10-26 Thread Savvas-Andreas Moysidis
If I get your question right, you probably want to use the AND binary
operator as in "samsung AND andriod AND GPS" or "+samsung +andriod +GPS"

On 26 October 2010 14:07, Varun Gupta  wrote:

> Hi,
>
> I have lot of small documents (each containing 1 to 15 words) indexed in
> Solr. For the search query, I want the search results to contain only those
> documents that satisfy this criteria "All of the words of the search result
> document are present in the search query"
>
> For example:
> If I have the following documents indexed: "nokia n95", "GPS", "android",
> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
>
> If I search with the text "samsung andriod GPS", search results should only
> conain "samsung", "GPS", "andriod" and "samsung andriod".
>
> Is there a way to do this in Solr.
>
> --
> Thanks
> Varun Gupta
>


How do I this in Solr?

2010-10-26 Thread Varun Gupta
Hi,

I have lot of small documents (each containing 1 to 15 words) indexed in
Solr. For the search query, I want the search results to contain only those
documents that satisfy this criteria "All of the words of the search result
document are present in the search query"

For example:
If I have the following documents indexed: "nokia n95", "GPS", "android",
"samsung", "samsung andriod", "nokia andriod", "mobile with GPS"

If I search with the text "samsung andriod GPS", search results should only
conain "samsung", "GPS", "andriod" and "samsung andriod".

Is there a way to do this in Solr.

--
Thanks
Varun Gupta


Next Word - Any Suggestions?

2010-10-26 Thread Christopher Ball
Am about to implement a custom query that is sort of mash-up of Facets,
Highlighting, and SpanQuery - but thought I'd see if anyone has done
anything similar. 

 

In simple words, I need facet on the next word given a target word.

 

For example, if my index only had the following 5 documents (comprised of a
sentence each):

 

Doc 1 - The quick brown fox jumped over the fence.

Doc 2 - The sly fox skipped over the fence.

Doc 3 - The fat fox skipped his afternoon class.

Doc 4 - A brown duck and red fox, crashed the party.

Doc 5 - Charles Brown! Fox! Crashed my damn car.

 

The query should give the frequency of the distinct terms after the word
"fox":

 

skipped - 2

crashed - 2 

jumped - 1

 

Long-term, do the opposite - frequency of the distinct terms before the word
"fox":

 

brown - 2

sly - 1

fat - 1 

red - 1

 

My guess is that either the FastVectorHighlighter or SpanQuery would be a
reasonable starting point. I was hoping to take advantage of Vectors as I am
storing termVectors, termPositions, and termOffsets for the field in
question.

 

Grateful for any thoughts . . . reference implementations . . . words of
encouragement . . . free beer - whatever you can offer.

 

Gracias,

 

Christopher

 



RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir
Try:
select IF(sub_cat_id=2002, DATE_FORMAT(ad_post_date,
'%Y-%m-%dT00:00:00Z/DAY'), null) as 'ad_sort_field' from
tcuser.ad_details where 

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 1:29 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

My Sql is

select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from
tcuser.ad_details where 

+---+
| ad_sort_field |
+---+
| 2010-05-30|
| 2010-05-02|
| 2010-10-07|
| NULL|
| 2010-10-15|
| NULL|
++

Thanks
Pawan


On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty 
wrote:

> On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira 
> wrote:
> > I am using mysql database, and, field type is "date"
> [...]
>
> Could you show us the exact SELECT statement, and some example
> values returned by running the SELECT directly at a mysql console?
>
> Regards,
> Gora
>



-- 
Thanks,
Pawan Darira


Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Swapnonil Mukherjee
Hi Everybody,

Thanks Ephraim and Peter. I think I got my answer.

Swapnonil Mukherjee




On 26-Oct-2010, at 4:23 PM, Ephraim Ofir wrote:

> Note that usually when you change the schema.xml you have not only to
> restart solr, but also rebuild the index, so the issue of how to reload
> the file seems like a small problem...
> 
> Ephraim Ofir
> 
> -Original Message-
> From: Peter Karich [mailto:peat...@yahoo.de] 
> Sent: Tuesday, October 26, 2010 12:29 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Does Solr reload schema.xml dynamically?
> 
>  Hi,
> 
> See this:
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
> 
> Solr will also load the new configuration (without restart the webapp) 
> on the slaves when using replication:
> http://wiki.apache.org/solr/SolrReplication
> 
> Regards,
> Peter.
> 
>> Hi Everybody,
>> 
>> If I change my schema.xml to, do I have to restart Solr. Is there some
> way, I can apply the changes to schema.xml without restarting Solr?
>> 
>> Swapnonil Mukherjee
>> 
>> 
>> 
>> 
> 
> 
> -- 
> http://jetwick.com twitter search prototype
> 



Query only a specfic field with a specific value using Dismax Handler

2010-10-26 Thread Swapnonil Mukherjee
Hi Everybody,

Let me give you a brief idea of our Solr document. We have about 6 text type 
fields, each containing IPTC data extracted from photos. Search is performed 
mostly on these 6 fields.
We also have a mutlivalue field named group_id that contains a list of all the  
group_ids that have access to this photo.  In other words we are storing the 
metadata of the photo as well as the permissions applicable for this photo in 
the Solr document itself. This group_id field by the way is of long type.

Additionally we have certain boolean and constant type fields named 
visibleToEndUser (boolean) and entityType (a java enum between 0 to 5).

The first field defaultSearch is a copyField which contains a copy of all the 
values of 6 text type fields that I have mentioned.

The way we query presently using the default search handler is like this.

defaultSearch:(Dog) AND (group_id:2181347 OR group_id:2181364 OR 
group_id:2216624 OR group_id:2216990) AND (entityType:0) AND 
(visibleToEndUser:true)

We want to start using the dismax (if not dismax then edismax)  query handler 
but so far I have not been able to replicate the query mentioned above to the 
equivalent dismax form.

What I cannot figure out is?

1. How do I apply exact match on the group_id, visibleToEndUser and the 
entityType fields? Or How how do I query a specific field with a specific value 
rather than searching across all fields with all values.
2. How do I apply OR and AND conditions?


Swapnonil Mukherjee





Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira
My Sql is

select IF(sub_cat_id=2002, ad_post_date, null) as 'ad_sort_field' from
tcuser.ad_details where 

+---+
| ad_sort_field |
+---+
| 2010-05-30|
| 2010-05-02|
| 2010-10-07|
| NULL|
| 2010-10-15|
| NULL|
++

Thanks
Pawan


On Tue, Oct 26, 2010 at 4:36 PM, Gora Mohanty  wrote:

> On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira 
> wrote:
> > I am using mysql database, and, field type is "date"
> [...]
>
> Could you show us the exact SELECT statement, and some example
> values returned by running the SELECT directly at a mysql console?
>
> Regards,
> Gora
>



-- 
Thanks,
Pawan Darira


Re: How to index on basis of a condition?

2010-10-26 Thread Gora Mohanty
On Tue, Oct 26, 2010 at 3:56 PM, Pawan Darira  wrote:
> I am using mysql database, and, field type is "date"
[...]

Could you show us the exact SELECT statement, and some example
values returned by running the SELECT directly at a mysql console?

Regards,
Gora


RE: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Ephraim Ofir
Note that usually when you change the schema.xml you have not only to
restart solr, but also rebuild the index, so the issue of how to reload
the file seems like a small problem...

Ephraim Ofir

-Original Message-
From: Peter Karich [mailto:peat...@yahoo.de] 
Sent: Tuesday, October 26, 2010 12:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Does Solr reload schema.xml dynamically?

  Hi,

See this:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

Solr will also load the new configuration (without restart the webapp) 
on the slaves when using replication:
http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.

> Hi Everybody,
>
> If I change my schema.xml to, do I have to restart Solr. Is there some
way, I can apply the changes to schema.xml without restarting Solr?
>
> Swapnonil Mukherjee
>
>
>
>


-- 
http://jetwick.com twitter search prototype



Highlighting for non-stored fields

2010-10-26 Thread Phong Dais
Hi,

I've been looking thru the mailing archive for the past week and I haven't
found any useful info regarding this issue.

My requirement is to index a few terabytes worth of data to be searched.
Due to the size of the data, I would like to index without storing but I
would like to use the highlighting feature.  Is this even possible?  What
are my options?

I've read about termOffsets, payload that could possibly be used to do this
but I have no idea how this could be done.

Any pointers greatly appreciated.  Someone please point me in the right
direction.

 I don't mind having to write some code or digging thru existing code to
accomplish this task.

Thanks,
P.


RE: How to index on basis of a condition?

2010-10-26 Thread Ephraim Ofir
This is probably just a date format problem, nothing to do with the IF()
statement.  Try applying this on your date:
DATE_FORMAT(yourDate, '%Y-%m-%dT00:00:00Z')

Ephraim Ofir

-Original Message-
From: Pawan Darira [mailto:pawan.dar...@gmail.com] 
Sent: Tuesday, October 26, 2010 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: How to index on basis of a condition?

I am using mysql database, and, field type is "date"

On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty 
wrote:

> On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira 
> wrote:
> > Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement.
The
> > query result is correct. But when i see it in my index, the value
stored
> is
> > something unusual bunch of characters e.g. "*...@6628ad5a"*
> [...]
>
> Which database are you indexing from? The field type is probably
> a blob in the database. Check that, and look into the ClobTransformer:
> http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
>
> Regards,
> Gora
>



-- 
Thanks,
Pawan Darira


Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Peter Karich

 Hi,

See this:
http://wiki.apache.org/solr/CoreAdmin#RELOAD

Solr will also load the new configuration (without restart the webapp) 
on the slaves when using replication:

http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.


Hi Everybody,

If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
can apply the changes to schema.xml without restarting Solr?

Swapnonil Mukherjee







--
http://jetwick.com twitter search prototype



Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira
I am using mysql database, and, field type is "date"

On Tue, Oct 26, 2010 at 2:56 PM, Gora Mohanty  wrote:

> On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira 
> wrote:
> > Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The
> > query result is correct. But when i see it in my index, the value stored
> is
> > something unusual bunch of characters e.g. "*...@6628ad5a"*
> [...]
>
> Which database are you indexing from? The field type is probably
> a blob in the database. Check that, and look into the ClobTransformer:
> http://wiki.apache.org/solr/DataImportHandler#ClobTransformer
>
> Regards,
> Gora
>



-- 
Thanks,
Pawan Darira


Re: command line to check if Solr is up running

2010-10-26 Thread Peter Karich

 Hi Xin,

from the wiki:
http://wiki.apache.org/solr/SolrConfigXml

The URL of the "ping" query is* /admin/ping

* You can also check (via wget) the number of documents. it might look 
like a rusty hack but it works for me:


wget -T 1 -q "http://localhost:8080/solr/select?q=*:*"; -O - |  tr '/>' 
'\n' | grep numFound | tr '"' ' ' | awk '{print $5}'`


Regards,
Peter.


As we know we can use browser to check if Solr is running by going to 
http://$hostName:$portNumber/$masterName/admin, say http://localhost:8080/solr1/admin. My questions 
is: are there any ways to check it using command line? I used "curl 
http://localhost:8080"; to check my Tomcat, it worked fine. However, no response if I try 
"curl http://localhost:8080/solr1/admin"; (even when my Solr is running). Does anyone know 
any command line alternatives?

Thanks,
Xin
This electronic mail message contains information that (a) is or
may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE
PROTECTED
BY LAW FROM DISCLOSURE, and (b) is intended only for the use of
the addressee(s) named herein.  If you are not an intended
recipient, please contact the sender immediately and take the
steps necessary to delete the message completely from your
computer system.

Not Intended as a Substitute for a Writing: Notwithstanding the
Uniform Electronic Transaction Act or any other law of similar
effect, absent an express statement to the contrary, this e-mail
message, its contents, and any attachments hereto are not
intended
to represent an offer or acceptance to enter into a contract and
are not otherwise intended to bind this sender,
barnesandnoble.com
llc, barnesandnoble.com inc. or any other person or entity.



--
http://jetwick.com twitter search prototype



Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread David Stuart
If you are using Solr Multicore http://wiki.apache.org/solr/CoreAdmin you can 
issue a Reload command 
http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0

On 26 Oct 2010, at 11:09, Swapnonil Mukherjee wrote:

> Hi Everybody,
> 
> If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
> can apply the changes to schema.xml without restarting Solr?
> 
> Swapnonil Mukherjee
> 
> 
> 



Does Solr reload schema.xml dynamically?

2010-10-26 Thread Swapnonil Mukherjee
Hi Everybody,

If I change my schema.xml to, do I have to restart Solr. Is there some way, I 
can apply the changes to schema.xml without restarting Solr?

Swapnonil Mukherjee





Re: How to index on basis of a condition?

2010-10-26 Thread Gora Mohanty
On Tue, Oct 26, 2010 at 2:37 PM, Pawan Darira  wrote:
> Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The
> query result is correct. But when i see it in my index, the value stored is
> something unusual bunch of characters e.g. "*...@6628ad5a"*
[...]

Which database are you indexing from? The field type is probably
a blob in the database. Check that, and look into the ClobTransformer:
http://wiki.apache.org/solr/DataImportHandler#ClobTransformer

Regards,
Gora


Re: How to index on basis of a condition?

2010-10-26 Thread Pawan Darira
Thanks Mr. Ephraim Ofir. I used the SELECT IF() for my requirement. The
query result is correct. But when i see it in my index, the value stored is
something unusual bunch of characters e.g. "*...@6628ad5a"*

Please suggest as to what went wrong.

- Pawan


On Mon, Oct 25, 2010 at 6:44 PM, Ephraim Ofir  wrote:

> Assuming you're talking about data that comes from a DB, I find it easiest
> to do this kind of logic on the DB's side (mssql example):
> SELECT IF(someField = someValue, desiredValue, NULL) AS desiredName from
> someTable
>
> If that's not possible, you can use RegexTransformer(
> http://wiki.apache.org/solr/DataImportHandler#RegexTransformer) or (worst
> case and worst performance) ScriptTransformer(
> http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer) and
> actually write a JS script to do your logic.
>
> Ephraim Ofir
>
> -Original Message-
> From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com]
> Sent: Monday, October 25, 2010 10:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to index on basis of a condition?
>
> Do you want to use a field's content do decide whether the document should
> be indexed or not?
> You could write an UpdateProcessor for that, simply aborting the chain for
> the docs that don't pass your test.
>
> @Override
> public void processAdd(AddUpdateCommand cmd) throws IOException {
>SolrInputDocument doc = cmd.getSolrInputDocument();
>String value = (String) doc.getFieldValue("myfield");
>String condition = "foobar";
>if(value == condition) {
>super.processAdd(cmd);
>}
> }
>
> But if what you meant was to skip only that field if it does not match
> condition, you could use doc.removeField(name) instead. Now you can feed
> your content using whatever method you like.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 25. okt. 2010, at 08.38, Pawan Darira wrote:
>
> > Hi
> >
> > I want to index a particular field on one if() condition. Can i do it
> > through DIH?
> >
> > Please suggest.
> >
> > --
> > Thanks,
> > Pawan Darira
>
>


-- 
Thanks,
Pawan Darira


Re: Need help for solr searching case insensative item

2010-10-26 Thread Jan Høydahl / Cominvent
Hi,

You need to share relevant parts of your schema for us to be able to see what's 
going on.

Try using fieldType="text". Basically, you need a fieldType which has the 
lowercaseFilter included.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 25. okt. 2010, at 21.09, wu liu wrote:

> Hi all,
> 
> I just noticed a wierd thing happend to my solr search result.
> if I do a search for "ecommons", it cannot get the result for "eCommons", 
> instead,
> if i do a search for "eCommons", i can only get all the match for "eCommons", 
> but not "ecommons".
> 
> I cannot figure it out why?
> 
> please help me
> 
> Thanks very much in advance



Re: Need help for solr searching case insensative item

2010-10-26 Thread yandong yao
Sounds like WordDelimiterFilter config issue, please refer to
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
.

Also it will help if you could provide:
1) Tokenizers/Filters config in schema file
2) analysis.jsp output in admin page.

2010/10/26 wu liu 

> Hi all,
>
> I just noticed a wierd thing happend to my solr search result.
> if I do a search for "ecommons", it cannot get the result for "eCommons",
> instead,
> if i do a search for "eCommons", i can only get all the match for
> "eCommons", but not "ecommons".
>
> I cannot figure it out why?
>
> please help me
>
> Thanks very much in advance
>


Re: DIH wiht several Cores

2010-10-26 Thread stockiii

okay. how did you solve this ? 
do you wrote an own importer ? 

we have a "own" "importer" yet, but only for one instance of solr and one
index, we want to spit this in severeal cores and indexes and want to use
DIH because we think his indexing is better than a php skript ...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-wiht-several-Cores-tp1767883p1772223.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Modelling Access Control

2010-10-26 Thread Lance Norskog
The idea of ACL-based queries is: each document carries all of the
groups or roles that it is ok with. Each user search includes all of
the groups or roles the user has.

The roles are stored as multivalued string fields. Each ACL-based
query passes in "roles:A OR roles:B OR roles:C" and if any of A,B,C
are in the stored ACL field, you have a match.

This is called "early binding". "Late binding" is when you return
everything and the app calls LDAP and say "can she see this? or
this?". This is slow and puts a monster load on the ACL server.

Very important: do not make a spelling or autosuggest index from a
text field which some people can see and other people can't.

On Tue, Oct 26, 2010 at 12:06 AM, Lance Norskog  wrote:
> Filter queries are a set of bits which is ANDed against query results
> at a very early stage of query processing. They are very useful.  Note
> that they are stored (I think) in parsed query order, so you have to
> pass in the same filter query string each time.
>
> On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon  wrote:
>> Thanks for that insight, a lot.
>>
>> Dennis Gearon
>>
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a 
>> better idea to learn from others’ mistakes, so you do not have to make them 
>> yourself. from 
>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>> EARTH has a Right To Life,
>>  otherwise we all die.
>>
>>
>> --- On Mon, 10/25/10, Jonathan Rochkind  wrote:
>>
>>> From: Jonathan Rochkind 
>>> Subject: Re: Modelling Access Control
>>> To: "solr-user@lucene.apache.org" 
>>> Date: Monday, October 25, 2010, 8:19 AM
>>> Dennis Gearon wrote:
>>> > why use filter queries?
>>> >
>>> > Wouldn't reducing the set headed into the filters by
>>> putting it in the main query be faster? (A question to
>>> learn, since I do NOT know :-)
>>> >
>>> >
>>> No. At least as I understand it. In the best case, the
>>> filter query will be a lot faster, because filter queries
>>> are cached seperately in the filter cache.  So if the
>>> existing filter query can be found in the cache, it'll be a
>>> lot faster. If it's not in the cache, the performance should
>>> be pretty much the same as if you had included it as an
>>> additional clause in the main q query.
>>>
>>> The reasons to put it in a fq filter are:
>>>
>>> 1) The caching behavior. You can have that certain part of
>>> the query be cached on it's own, speeding up any subsequent
>>> queries that use that same fq.
>>>
>>> 2) Simplification of client code. You can leave your 'q'
>>> however you want it, using whatever kind of query parser you
>>> want too (dismax, whatever), and just add on the 'fq'
>>> without touching the 'q'.   This is a lot
>>> easier to do, and especially when you're using it for access
>>> control like this, a lot harder for a bug to creep in.
>>>
>>> Jonathan
>>>
>>>
>>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: Modelling Access Control

2010-10-26 Thread Lance Norskog
Filter queries are a set of bits which is ANDed against query results
at a very early stage of query processing. They are very useful.  Note
that they are stored (I think) in parsed query order, so you have to
pass in the same filter query string each time.

On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon  wrote:
> Thanks for that insight, a lot.
>
> Dennis Gearon
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
> better idea to learn from others’ mistakes, so you do not have to make them 
> yourself. from 
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
>  otherwise we all die.
>
>
> --- On Mon, 10/25/10, Jonathan Rochkind  wrote:
>
>> From: Jonathan Rochkind 
>> Subject: Re: Modelling Access Control
>> To: "solr-user@lucene.apache.org" 
>> Date: Monday, October 25, 2010, 8:19 AM
>> Dennis Gearon wrote:
>> > why use filter queries?
>> >
>> > Wouldn't reducing the set headed into the filters by
>> putting it in the main query be faster? (A question to
>> learn, since I do NOT know :-)
>> >
>> >
>> No. At least as I understand it. In the best case, the
>> filter query will be a lot faster, because filter queries
>> are cached seperately in the filter cache.  So if the
>> existing filter query can be found in the cache, it'll be a
>> lot faster. If it's not in the cache, the performance should
>> be pretty much the same as if you had included it as an
>> additional clause in the main q query.
>>
>> The reasons to put it in a fq filter are:
>>
>> 1) The caching behavior. You can have that certain part of
>> the query be cached on it's own, speeding up any subsequent
>> queries that use that same fq.
>>
>> 2) Simplification of client code. You can leave your 'q'
>> however you want it, using whatever kind of query parser you
>> want too (dismax, whatever), and just add on the 'fq'
>> without touching the 'q'.   This is a lot
>> easier to do, and especially when you're using it for access
>> control like this, a lot harder for a bug to creep in.
>>
>> Jonathan
>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com