date:20110705

Hi,
Let say, I have got 10^10 documents in an index with unique id being document 
id which is assigned to each of those from 1 to 10^10 .
Now I want to search a particular query string in a subset of these documents 
say ( document id 100 to 1000).

The question here is.. will SOLR able to search just in this set of documents 
rather than the entire index ? if yes what should be query to limit search into 
this subset ?

Regards,
JAME VAALET
Software Developer
EXT :8108
Capital IQ

Re: searching a subset of SOLR index

2011-07-05 Thread Shashi Kant

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Re: configure dismax requesthandlar for boost a field

On Tue, Jul 5, 2011 at 08:46, Romi  wrote:
> will merely adding fl=score make difference in search results, i mean will i
> get desired results now???

The fl parameter stands for "field list" and allows you to configure
in a request which result fields should be returned.

If you try to tweak the boosts in order to change your result order,
it's wise to add the calculated "score" to the output by setting
something like fl=score,*.

This reminds me of another important question: Are you sorting the
result by score? Because if not, your changes to the boosts/score
won't ever have an effect on the ordering.

http://wiki.apache.org/solr/CommonQueryParameters

Marian

Re: Spellchecker in zero-hit search result

On Mon, Jul 4, 2011 at 17:19, Juan Grande  wrote:
> Hi Marian,
>
> I guess that your problem isn't related to the number of results, but to the
> component's configuration. The configuration that you show is meant to set
> up an autocomplete component that will suggest terms from an incomplete user
> input (something similar to what google does while you're typing in the
> search box), see http://wiki.apache.org/solr/Suggester. That's why your
> suggestions to "place" are "places" and "placed", all sharing the "place"
> prefix. But when you search for "placw", the component doesn't return any
> suggestion, because in your index no term begins with "placw".
>
> You can learn how to correctly configure a spellchecker here:
> http://wiki.apache.org/solr/SpellCheckComponent. Also, I'd recommend to take
> a look at the example's solrconfig, because it provides an example
> spellchecker configuration.

Juan, thanks for the information!

I have read through that page for quite a while before doing my tests,
but it seems as if I had a different mental model. Then all reading
might not be worth it. I thought that the SpellCheckComponent would be
able to fetch index terms which are similar to the query term. The use
case for that, mainly (but not only) in case of a zero-hit search
would be to display the famous "Did you mean ..." hint.

So I'm going back to the docs and to the example. :)

Later,

Marian

Re: faceting on field with two values

On Tue, Jul 5, 2011 at 10:21, elisabeth benoit
 wrote:
> ...
>
> so do you think the dih (which I just discovered) would be appropriate to do
> the whole process (read a database, read fields from xml contained in some
> of the database columns, add informations from csv file)???
>
> from what I just read about dih, it seems so, but I'm still very confused
> about this dih thing.

As far as I can tell, the DataImportHandler is very useful if you want
to get data (only) from a database directly to Solr, with only slight
manipulation, e.g. concatenations. For that, it's much more convenient
than the path via scripts to generate XML.

It sounds like you are doing more than that in your importers.

Re: OOM at solr master node while updating document

2011/7/5 Chengyang :
> Is there any memory leak when I updating the index at the master node?
> Here is the stack trace.
>
> o.a.solr.servlet.SolrDispatchFilter - java.lang.OutOfMemoryError: Java heap 
> space

You don't need a memory leak to get a OOM error in Java. It might just
happen that the amount or RAM allocated by the virtual machine is used
up.

If you are running Solr as in the example, via Jetty on the command line, try

  java -server -jar start.jar

Or try the -Xmx parameter, e.g.

  java -Xmx1024M -jar start.jar

If you are using Tomcat or something else, you might want to look into
the docs on how to deal with the memory limit.

Marian

Re: what is the diff between katta and solrcloud?

2011-07-05 Thread sinoantony

Why katta stores index on HDFS?  Any advantages?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-is-the-diff-between-katta-and-solrcloud-tp2275554p3139983.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET

-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query

On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Re: Spellchecker in zero-hit search result

On Tue, Jul 5, 2011 at 11:24, Marian Steinbach  wrote:
> On Mon, Jul 4, 2011 at 17:19, Juan Grande  wrote:
>> ...
>>
>> You can learn how to correctly configure a spellchecker here:
>> http://wiki.apache.org/solr/SpellCheckComponent. Also, I'd recommend to take
>> a look at the example's solrconfig, because it provides an example
>> spellchecker configuration.

I found the problem. My suggest component had the line

  org.apache.solr.spelling.suggest.Suggester

which I probably copied without knowing what I did. I removed it to
use the default IndexBasedSpellChecker. No I get my suggestions on
zero-hit searches as well.

Thanks again!

Marian

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

nice...where?

I'm trying to figure out 2 things:
1) How to create an analyzer that corresponds to the one in the schema.xml.

2) I'd like to see the code that creates it reading it from schema.xml .

On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma
wrote:

> No. SolrJ only builds input docs from NutchDocument objects. Solr will do
> analysis. The integration is analogous to XML post of Solr documents.
>
> On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote:
> > Hello,
> >
> > I'm trying to understand better Nutch and Solr integration. My
> > understanding is that Documents are added to Solr index from SolrWriter's
> > write(NutchDocument doc) method. But does it make any use of the
> > WhitespaceTokenizerFactory?
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: full text searching in cloud for minor enterprises

2011-07-05 Thread Joe Scanlon

Look at searchblox

On Monday, July 4, 2011, Li Li  wrote:
> hi all,
>     I want to provide full text searching for some "small" websites.
> It seems cloud computing is  popular now. And it will save costs
> because it don't need employ engineer to maintain
> the machine.
>     For now, there are many services such as amazon s3, google app
> engine, ms azure etc. I am not familiar with cloud computing. Anyone
> give me a direction or some advice? thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Joe Scanlon

jscan...@element115.net

Mobile: 603 459 3242
Office:  312 445 0018

RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.

Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 

Regards,
JAME VAALET

-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET

-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query

On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Re: @field for child object

2011-07-05 Thread Mark Miller

Not yet - I've played around with support in this issue in the past though: 
https://issues.apache.org/jira/browse/SOLR-1945

On Jul 4, 2011, at 6:04 AM, Kiwi de coder wrote:

> hi,
> 
> i wondering solrj @Field annotation support embedded child object ? e.g.
> 
> class A {
> 
>  @field
>  string somefield;
> 
> @emebedded
>  B b;
> 
> }
> 
> regards,
> kiwi

- Mark Miller
lucidimagination.com

Re: Feed index with analyzer output

2011-07-05 Thread Lox

Ok, 

the very short question is:
Is there a way to submit the analyzer response so that solr already knows
what to do with that response? (that is, which field are to be treated as
payloads, which are tokens, etc...)

Chris Hostetter-3 wrote:
> 
> can you explain a bit more about what you goal is here?  what info are you 
> planning on extracting?  what do you intend to change between the info you 
> get back in the first request and the info you want to send in the second 
> request?
> 

I plan to add some payloads to some terms between request#1 and request#2.

Chris Hostetter-3 wrote:
> 
> your analyziers and whatnot for request#1 would be exactly what you're use 
> to, but for request#2 you'd need to specify an analyzer that would let you 
> specify, in the field value, the details about the term and position, and 
> offsets, and payloads and what not ... the 
> DelimitedPayloadTokenFilterFactory / DelimitedPayloadTokenFilter can help 
> with some of that, but not all -- you'd either need your own custom 
> analyzer or custom FieldType or something depending on teh specific 
> changes you want to make.
> 
> Frankly though i really believe you are going about this backwards -- if 
> you want to manipulate the Tokenstream after analysis but before indexing, 
> then why not implement this custom logic thta you want in a TokenFilter 
> and use it in the last TokenFilterFactory you have for your analyzer?
> 
> 

Yeah, I thought about that. I really wanted to know if there weren't an
already implemented way to do that to avoid reinventing the wheel.

It would be cool if I were able to send info to solr formatted in a way like
I imagined in my last mail, so that a call to any Tokenizer or TokenFilter
wouldn't be necessary. It would have been like using an empty analyzer but
still retaining the various token information.

Thank you!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3140460.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE

>From what you tell us, I guess a separate index for website docs would be the 
>best. If you fear that request from the window service would cripple your web 
>site performance, why not have a totally separated index on another server, 
>and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.

Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 

Regards,
JAME VAALET

-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET

-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query

On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Re: configure dismax requesthandlar for boost a field

2011-07-05 Thread Romi

I got the point that to boost search result i have to sort by score.
But as in solrconfig for dismax request handler i use  
*
text^0.5 name^1.0 description^1.5 
 *


because i want docs having querystring in description field come upper in
search results.
but what i am getting is first doc in search result which does not have
querystring in its description field.


-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boost-a-field-tp3137239p3140501.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Erick Erickson

Let's see the results of adding &debugQuery=on to your URL. Are you getting
any documents back at all? If not, then your query isn't getting any
documents to group.

You haven't told us much about what you're trying to do, you might want to
review: http://wiki.apache.org/solr/UsingMailingLists

Best
Erick
On Jul 4, 2011 11:55 AM, "Per Newgro"  wrote:

Re: configure dismax requesthandlar for boost a field

> I got the point that to boost search
> result i have to sort by score.
> But as in solrconfig for dismax request handler i use 
> 
> *
>         text^0.5 name^1.0
> description^1.5 
>  *
> 
> 
> because i want docs having querystring in description field
> come upper in
> search results.
> but what i am getting is first doc in search result which
> does not have
> querystring in its description field.

Increase the boost factor of description till you get. Lets say make it 100. 
Adding &debugQuery=on will show how the actual score is calculated.

Re: How to boost a querystring at query time

> When querystring is q=gold^2.0 ring(boost gold) and
> qt=standard i got the
> results for gold ring and when qt=dismax i got no result
> why so please
> explain

q=gold^2.0 ring(boost gold)&defType=dismax would return a document that 
contains exactly "gold^2.0 ring(boost gold)" in it.

dismax is designed to work with simple keyword queries. You can use only three 
special characters: + - "

Rest of the characters, : [ ] ^ ~ etc, don't work. i.e. they are escaped.

Please see http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

I suspect the following should do (1). I'm just not sure about file
references as in  stopInit.put("words", "stopwords.txt") . (2) should
clarify.

1)
class SchemaAnalyzer extends Analyzer{

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
HashMap stopInit = new HashMap();
stopInit.put("words", "stopwords.txt");
stopInit.put("ignoreCase", Boolean.TRUE.toString());
StopFilterFactory stopFilterFactory = new StopFilterFactory();
stopFilterFactory.init(stopInit);

final HashMap wordDelimInit = new
HashMap();
wordDelimInit.put("generateWordParts", "1");
wordDelimInit.put("generateNumberParts", "1");
wordDelimInit.put("catenateWords", "1");
wordDelimInit.put("catenateWords", "1");
wordDelimInit.put("catenateNumbers", "1");
wordDelimInit.put("catenateAll", "0");
wordDelimInit.put("splitOnCaseChange", "1");

WordDelimiterFilterFactory wordDelimiterFilterFactory = new
WordDelimiterFilterFactory();
wordDelimiterFilterFactory.init(wordDelimInit);
HashMap porterInit = new HashMap();
porterInit.put("protected", "protwords.txt");
EnglishPorterFilterFactory englishPorterFilterFactory = new
EnglishPorterFilterFactory();
englishPorterFilterFactory.init(porterInit);

return new
RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new
LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new
WhitespaceTokenizer(reader));
}
}

On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout
wrote:

> nice...where?
>
> I'm trying to figure out 2 things:
> 1) How to create an analyzer that corresponds to the one in the schema.xml.
>
>
>  
> 
> 
>  generateWordParts="1" generateNumberParts="1"/>
>   
>
> 2) I'd like to see the code that creates it reading it from schema.xml .
>
>
> On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma  > wrote:
>
>> No. SolrJ only builds input docs from NutchDocument objects. Solr will do
>> analysis. The integration is analogous to XML post of Solr documents.
>>
>> On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote:
>> > Hello,
>> >
>> > I'm trying to understand better Nutch and Solr integration. My
>> > understanding is that Documents are added to Solr index from
>> SolrWriter's
>> > write(NutchDocument doc) method. But does it make any use of the
>> > WhitespaceTokenizerFactory?
>>
>> --
>> Markus Jelsma - CTO - Openindex
>> http://www.linkedin.com/in/markus17
>> 050-8536620 / 06-50258350
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x) < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

Not yet an answer to 2) but this is where and how Solr initializes the
Analyzer defined in the schema.xml into :

//org.apache.solr.schema.IndexSchema
 // Load the Tokenizer
// Although an analyzer only allows a single Tokenizer, we load a list
to make sure
// the configuration is ok
//

final ArrayList tokenizers = new
ArrayList(1);
AbstractPluginLoader tokenizerLoader =
  new AbstractPluginLoader( "[schema.xml]
analyzer/tokenizer", false, false )
{
  @Override
  protected void init(TokenizerFactory plugin, Node node) throws
Exception {
if( !tokenizers.isEmpty() ) {
  throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
  "The schema defines multiple tokenizers for: "+node );
}
final Map params =
DOMUtil.toMapExcept(node.getAttributes(),"class");
// copy the luceneMatchVersion from config, if not set
if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
  params.put(LUCENE_MATCH_VERSION_PARAM,
solrConfig.luceneMatchVersion.toString());
plugin.init( params );
tokenizers.add( plugin );
  }

  @Override
  protected TokenizerFactory register(String name, TokenizerFactory
plugin) throws Exception {
return null; // used for map registration
  }
};
tokenizerLoader.load( loader, (NodeList)xpath.evaluate("./tokenizer",
node, XPathConstants.NODESET) );

// Make sure something was loaded
if( tokenizers.isEmpty() ) {
  throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,"analyzer
without class or tokenizer & filter list");
}


// Load the Filters
//

final ArrayList filters = new
ArrayList();
AbstractPluginLoader filterLoader =
  new AbstractPluginLoader( "[schema.xml]
analyzer/filter", false, false )
{
  @Override
  protected void init(TokenFilterFactory plugin, Node node) throws
Exception {
if( plugin != null ) {
  final Map params =
DOMUtil.toMapExcept(node.getAttributes(),"class");
  // copy the luceneMatchVersion from config, if not set
  if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
params.put(LUCENE_MATCH_VERSION_PARAM,
solrConfig.luceneMatchVersion.toString());
  plugin.init( params );
  filters.add( plugin );
}
  }

  @Override
  protected TokenFilterFactory register(String name, TokenFilterFactory
plugin) throws Exception {
return null; // used for map registration
  }
};
filterLoader.load( loader, (NodeList)xpath.evaluate("./filter", node,
XPathConstants.NODESET) );

return new TokenizerChain(charFilters.toArray(new
CharFilterFactory[charFilters.size()]),
tokenizers.get(0), filters.toArray(new
TokenFilterFactory[filters.size()]));
  };


On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout
wrote:

> I suspect the following should do (1). I'm just not sure about file
> references as in  stopInit.put("words", "stopwords.txt") . (2) should
> clarify.
>
> 1)
> class SchemaAnalyzer extends Analyzer{
>
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
> HashMap stopInit = new
> HashMap();
> stopInit.put("words", "stopwords.txt");
> stopInit.put("ignoreCase", Boolean.TRUE.toString());
> StopFilterFactory stopFilterFactory = new StopFilterFactory();
> stopFilterFactory.init(stopInit);
>
> final HashMap wordDelimInit = new
> HashMap();
> wordDelimInit.put("generateWordParts", "1");
> wordDelimInit.put("generateNumberParts", "1");
> wordDelimInit.put("catenateWords", "1");
> wordDelimInit.put("catenateWords", "1");
> wordDelimInit.put("catenateNumbers", "1");
> wordDelimInit.put("catenateAll", "0");
> wordDelimInit.put("splitOnCaseChange", "1");
>
> WordDelimiterFilterFactory wordDelimiterFilterFactory = new
> WordDelimiterFilterFactory();
> wordDelimiterFilterFactory.init(wordDelimInit);
> HashMap porterInit = new HashMap String>();
> porterInit.put("protected", "protwords.txt");
> EnglishPorterFilterFactory englishPorterFilterFactory = new
> EnglishPorterFilterFactory();
> englishPorterFilterFactory.init(porterInit);
>
> return new
> RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new
> LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new
> WhitespaceTokenizer(reader));
> }
> }
>
> On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout  > wrote:
>
>> nice...where?
>>
>> I'm trying to figure out 2 things:
>> 1) How to create an analyzer that corresponds to the one in the
>> schema.xml.
>>
>>  
>>

RE: searching a subset of SOLR index

But incase the website docs contribute around 50 % of the entire docs , why to 
recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it 
without interfering each other 

Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ

-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

>From what you tell us, I guess a separate index for website docs would be the 
>best. If you fear that request from the window service would cripple your web 
>site performance, why not have a totally separated index on another server, 
>and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.

Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 

Regards,
JAME VAALET

-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET

-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query

On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Re: How to boost a querystring at query time

2011-07-05 Thread Romi

than what should i do to get the required result. ie. if i want to boost gold
than which querytype i should use.

-
Thanks & Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-boost-a-querystring-at-query-time-tp3139800p3140703.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reading data from Solr MoreLikeThis

2011-07-05 Thread Sheetal

Hi Juan,

Thank you very much..Your code worked pretty awesome and was real
helpfulGreat start of the day...:)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Reading-data-from-Solr-MoreLikeThis-tp3130184p3140715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: searching a subset of SOLR index

2011-07-05 Thread Erik Hatcher

I wouldn't share the same index across two Solr webapps - as they could step on 
each others toes.  

In this scenario, I think having two Solr instances replicating from the same 
master is the way to go, to allow you to scale your load from each application 
separately.  

Erik



On Jul 5, 2011, at 09:04 , Jame Vaalet wrote:

> But incase the website docs contribute around 50 % of the entire docs , why 
> to recreate the indexes . don't you think its redundancy ?
> Can two web apps (solr instances ) share a single index file to search on it 
> without interfering each other 
> 
> 
> Regards,
> JAME VAALET
> Software Developer 
> EXT :8108
> Capital IQ
> 
> 
> -Original Message-
> From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
> Sent: Tuesday, July 05, 2011 5:12 PM
> To: solr-user@lucene.apache.org
> Subject: RE: searching a subset of SOLR index
> 
> From what you tell us, I guess a separate index for website docs would be the 
> best. If you fear that request from the window service would cripple your web 
> site performance, why not have a totally separated index on another server, 
> and have your website documents index in both indexes ?
> 
> Pierre
> 
> -Message d'origine-
> De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
> Envoyé : mardi 5 juillet 2011 13:14
> À : solr-user@lucene.apache.org
> Objet : RE: searching a subset of SOLR index
> 
> I have got two applications 
> 
> 1. website
>   The website will enable any user to search the document repository , 
> and the set they search on is known as website presentable
> 2. windows service 
>   The windows service will search on all the documents in the repository 
> for fixed set of key words and store the found result in database.this set
>is universal set of documents in the doc repository including the website 
> presentable.
> 
> 
> Website is a high prioritized app which should work smoothly without any 
> interference , where as windows service should run all day long continuously 
> without break to save result from incoming docs.
> The problem here is website set is predefined and I don't want the windows 
> service request to SOLR to slow down website request.
> 
> Suppose am segregating the website presentable docs index into a particular 
> core and rest of them into different core will it solve the problem ?
> I have also read about multiple ports for listening request from different 
> apps , can this be used. 
> 
> 
> 
> Regards,
> JAME VAALET
> 
> 
> -Original Message-
> From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
> Sent: Tuesday, July 05, 2011 3:52 PM
> To: solr-user@lucene.apache.org
> Subject: RE: searching a subset of SOLR index
> 
> The limit will always be logical if you have all documents in the same index. 
> But filters are very efficient when working with subset of your index, 
> especially if you reuse the same filter for many queries since there is a 
> cache.
> 
> If your subsets are always the same subsets, maybe your could use shards. But 
> we would need to know more about what you intend to do, to point to an 
> adequate solution.
> 
> Pierre
> 
> -Message d'origine-
> De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
> Envoyé : mardi 5 juillet 2011 11:10
> À : solr-user@lucene.apache.org
> Objet : RE: searching a subset of SOLR index
> 
> Thanks.
> But does this range query just limit the universe logically or does it have 
> any mechanism to limit this physically as well .Do we leverage time factor by 
> using the range query ?
> 
> Regards,
> JAME VAALET
> 
> 
> -Original Message-
> From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
> Kant
> Sent: Tuesday, July 05, 2011 2:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: searching a subset of SOLR index
> 
> Range query
> 
> 
> On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
>> Hi,
>> Let say, I have got 10^10 documents in an index with unique id being 
>> document id which is assigned to each of those from 1 to 10^10 .
>> Now I want to search a particular query string in a subset of these 
>> documents say ( document id 100 to 1000).
>> 
>> The question here is.. will SOLR able to search just in this set of 
>> documents rather than the entire index ? if yes what should be query to 
>> limit search into this subset ?
>> 
>> Regards,
>> JAME VAALET
>> Software Developer
>> EXT :8108
>> Capital IQ
>> 
>>

Re: How to boost a querystring at query time

> than what should i do to get the
> required result. ie. if i want to boost gold
> than which querytype i should use.

If you want to boost the keyword 'gold', you can use bq parameter.

defType=dismax&bq=someField:gold^100

See the other parameters : 
http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29

RE: searching a subset of SOLR index

2011-07-05 Thread Pierre GOSSE

It is redundancy. You have to balance the cost of redundancy with the cost in 
performance with your web index requested by your windows service. If your 
windows service is not too aggressive in its requests, go for shards.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 15:05
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

But incase the website docs contribute around 50 % of the entire docs , why to 
recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it 
without interfering each other 

Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ

-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

>From what you tell us, I guess a separate index for website docs would be the 
>best. If you fear that request from the window service would cripple your web 
>site performance, why not have a totally separated index on another server, 
>and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.

Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 

Regards,
JAME VAALET

-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET

-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query

On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet  wrote:
> Hi,
> Let say, I have got 10^10 documents in an index with unique id being document 
> id which is assigned to each of those from 1 to 10^10 .
> Now I want to search a particular query string in a subset of these documents 
> say ( document id 100 to 1000).
>
> The question here is.. will SOLR able to search just in this set of documents 
> rather than the entire index ? if yes what should be query to limit search 
> into this subset ?
>
> Regards,
> JAME VAALET
> Software Developer
> EXT :8108
> Capital IQ
>
>

Nightly builds

2011-07-05 Thread Benson Margulies

The solr download link does not point to or mention nightly builds.
Are they out there?

Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?

the answer to 2) is new IndexSchema(solrConf, schema).getAnalyzer();


On Tue, Jul 5, 2011 at 2:48 PM, Gabriele Kahlout
wrote:

> Not yet an answer to 2) but this is where and how Solr initializes the
> Analyzer defined in the schema.xml into :
>
> //org.apache.solr.schema.IndexSchema
>  // Load the Tokenizer
> // Although an analyzer only allows a single Tokenizer, we load a list
> to make sure
> // the configuration is ok
> //
> 
> final ArrayList tokenizers = new
> ArrayList(1);
> AbstractPluginLoader tokenizerLoader =
>   new AbstractPluginLoader( "[schema.xml]
> analyzer/tokenizer", false, false )
> {
>   @Override
>   protected void init(TokenizerFactory plugin, Node node) throws
> Exception {
> if( !tokenizers.isEmpty() ) {
>   throw new SolrException( SolrException.ErrorCode.SERVER_ERROR,
>   "The schema defines multiple tokenizers for: "+node );
> }
> final Map params =
> DOMUtil.toMapExcept(node.getAttributes(),"class");
> // copy the luceneMatchVersion from config, if not set
> if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
>   params.put(LUCENE_MATCH_VERSION_PARAM,
> solrConfig.luceneMatchVersion.toString());
> plugin.init( params );
> tokenizers.add( plugin );
>   }
>
>   @Override
>   protected TokenizerFactory register(String name, TokenizerFactory
> plugin) throws Exception {
> return null; // used for map registration
>   }
> };
> tokenizerLoader.load( loader, (NodeList)xpath.evaluate("./tokenizer",
> node, XPathConstants.NODESET) );
>
> // Make sure something was loaded
> if( tokenizers.isEmpty() ) {
>   throw new
> SolrException(SolrException.ErrorCode.SERVER_ERROR,"analyzer without class
> or tokenizer & filter list");
> }
>
>
> // Load the Filters
> //
> 
> final ArrayList filters = new
> ArrayList();
> AbstractPluginLoader filterLoader =
>   new AbstractPluginLoader( "[schema.xml]
> analyzer/filter", false, false )
> {
>   @Override
>   protected void init(TokenFilterFactory plugin, Node node) throws
> Exception {
> if( plugin != null ) {
>   final Map params =
> DOMUtil.toMapExcept(node.getAttributes(),"class");
>   // copy the luceneMatchVersion from config, if not set
>   if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM))
> params.put(LUCENE_MATCH_VERSION_PARAM,
> solrConfig.luceneMatchVersion.toString());
>   plugin.init( params );
>   filters.add( plugin );
> }
>   }
>
>   @Override
>   protected TokenFilterFactory register(String name, TokenFilterFactory
> plugin) throws Exception {
> return null; // used for map registration
>   }
> };
> filterLoader.load( loader, (NodeList)xpath.evaluate("./filter", node,
> XPathConstants.NODESET) );
>
> return new TokenizerChain(charFilters.toArray(new
> CharFilterFactory[charFilters.size()]),
> tokenizers.get(0), filters.toArray(new
> TokenFilterFactory[filters.size()]));
>   };
>
>
>
> On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout  > wrote:
>
>> I suspect the following should do (1). I'm just not sure about file
>> references as in  stopInit.put("words", "stopwords.txt") . (2) should
>> clarify.
>>
>> 1)
>> class SchemaAnalyzer extends Analyzer{
>>
>> @Override
>> public TokenStream tokenStream(String fieldName, Reader reader) {
>> HashMap stopInit = new
>> HashMap();
>> stopInit.put("words", "stopwords.txt");
>> stopInit.put("ignoreCase", Boolean.TRUE.toString());
>> StopFilterFactory stopFilterFactory = new StopFilterFactory();
>> stopFilterFactory.init(stopInit);
>>
>> final HashMap wordDelimInit = new
>> HashMap();
>> wordDelimInit.put("generateWordParts", "1");
>> wordDelimInit.put("generateNumberParts", "1");
>> wordDelimInit.put("catenateWords", "1");
>> wordDelimInit.put("catenateWords", "1");
>> wordDelimInit.put("catenateNumbers", "1");
>> wordDelimInit.put("catenateAll", "0");
>> wordDelimInit.put("splitOnCaseChange", "1");
>>
>> WordDelimiterFilterFactory wordDelimiterFilterFactory = new
>> WordDelimiterFilterFactory();
>> wordDelimiterFilterFactory.init(wordDelimInit);
>> HashMap porterInit = new HashMap> String>();
>> porterInit.put("protected", "protwords.txt");
>> EnglishPorterFilterFactory englishPorterFilterFactory = new
>> EnglishPorterFilterFactory();
>> englishPorterFilterFactory.init(porterInit);
>>
>> return new
>> RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new
>> LowerCaseF

Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston

Hello Friends,


I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
. I did the steps explained in the following two URL's :

http://wiki.apache.org/nutch/RunningNutchAndSolr

http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html


I downloaded both the softwares, however, I am getting error (*solrUrl is
not set, indexing will be skipped..*) when I am trying to crawl using
Cygwin.

Can anyone please help me out to fix this issue ?
Else any other website suggesting for Apache Nutch and Solr integration
would be greatly helpful.



Thanks & Regards,
Serenity

Re: Nightly builds

2011-07-05 Thread Tom Gross


On 07/05/2011 04:08 PM, Benson Margulies wrote:

The solr download link does not point to or mention nightly builds.
Are they out there?


http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuilds&l=1

--
Auther of the book "Plone 3 Multimedia" - http://amzn.to/dtrp0C

Tom Gross
email.@toms-projekte.de
skype.tom_gross
web.http://toms-projekte.de
blog...http://blog.toms-projekte.de

Re: Nightly builds

2011-07-05 Thread Benson Margulies

The reason for the email is not that I can't find them, but because
the project, I claim, should be advertising them more prominently on
the web site than buried in a wiki.

Where I come from, an lmgtfy link is rather hostile.

Oh, and you might want to fix the spelling of  'Author' in your own signature.

On Tue, Jul 5, 2011 at 10:19 AM, Tom Gross  wrote:
> On 07/05/2011 04:08 PM, Benson Margulies wrote:
>>
>> The solr download link does not point to or mention nightly builds.
>> Are they out there?
>>
> http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuilds&l=1
>
> --
> Auther of the book "Plone 3 Multimedia" - http://amzn.to/dtrp0C
>
> Tom Gross
> email.@toms-projekte.de
> skype.tom_gross
> web.http://toms-projekte.de
> blog...http://blog.toms-projekte.de
>
>

Re: MergerFactor and MaxMergerDocs effecting num of segments created

2011-07-05 Thread Shawn Heisey


On 7/4/2011 12:51 AM, Romi wrote:

Shawn when i reindex data using full-import i got:
*_0.fdt 3310
_0.fdx  23
_0.frq  857
_0.nrm  31
_0.prx  1748
_0.tis  350
_1.fdt  3310
_1.fdx  23
_1.fnm  1
_1.frq  857
_1.nrm  31
_1.prx  1748
_1.tii  5
_1.tis  350
segments.gen1
segments_3  1*

Where all  _1  marked as archived(A)

And when i run again full import(for testing ) i got _1 and 2_ files where
all 2_ marked as archive. What does it mean.
and the problem i am not getting is while i am doing full import which
deletes the old indexes and creates new than why i m getting the old one
again.


By mentioning the Archive bit, it sounds like you are running on 
Windows.  I've only run it on Linux, but I understand from reading 
messages on this list that there are a lot of problems on Windows with 
deleting old files whenever you do anything that results in old segments 
going away -- reindex, optimize, replication, normal segment merging, 
etc.  The current solr version is 3.3, previous versions are 3.2, 3.1, 
then 1.4.1.  Others will have to comment about whether things have 
improved in more recent releases.


The archive bit is simply a DOS/Windows attribute that says "this file 
needs to be backed up."  When you create or modify a file in a normal 
way, it is turned on.  Normally the only thing that turns that bit off 
is backup software, but Solr might be programmed to clear it on files 
that are no longer needed, in case the delete fails, so there's a way to 
detect that they should not be backed up.  I don't know if this is 
right, it's just speculation.


Thanks,
Shawn

RE: what s the optimum size of SOLR indexes

2011-07-05 Thread Burton-West, Tom

Hello,

On Mon, 2011-07-04 at 13:51 +0200, Jame Vaalet wrote:
> What would be the maximum size of a single SOLR index file for resulting in 
> optimum search time ?

How do you define optimimum?   Do you want the fastest possible response time 
at any cost or do you have a specific response time goal? 

Can you give us more details on your use case?   What kind of load are you 
expecting?  What kind of queries do you need to support?
Some of the trade-offs depend if you are CPU bound or I/O bound.

Assuming a fairly large index, if you *absolutely need* the fastest possible 
search response time and you can *afford the hardware*, you probably want to 
shard your index and size your indexes so they can all fit in memory (and do 
some work to make sure the index data is always in memory).  If you can't 
afford that much memory, but still need very fast response times, you might 
want to size your indexes so they all fit on SSD's.  As an example of a use 
case on the opposite side of the spectrum, here at HathiTrust, we have a very 
low number of queries per second and we are running an index that totals 6 TB 
in size with shards of about 500GB and average response times of 200ms (but 
99th percentile times of about 2 seconds).

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

Hit Rate

2011-07-05 Thread Briggs Thompson

Hello all,

Is there a good way to get the hit count of a search?

Example query:
textField:solr AND documentId:1000

Say document with Id = 1000 has "solr" 13 times in the document. Any way to
extract that number [13] in the response? I know we can return the score
which is loosely related to hit counts via tf-idf, but for this case I need
the actually hit counts. I believe you can get this information from the
logs, but that is less useful if the use case is on the presentation layer.

I tried faceting on the query but it seems like that returns the number of
documents that query matches rather than the hit count.
http://localhost:8080/solr/ExampleCore/select/?q=textField%3Asolr+AND+documentId%3A1246727&version=2.2&start=0&rows=10&indent=on&&facet=true&face.field=textField:solr&facet.query=
textField:solr

I was thinking that highlighting essentially returns the hit count if you
supply unlimited amount of snippets, but I imagine there must be a more
elegant solution.

Thanks in advance,
Briggs

Re: Feed index with analyzer output

2011-07-05 Thread Andrzej Bialecki


On 7/5/11 1:37 PM, Lox wrote:

Ok,

the very short question is:
Is there a way to submit the analyzer response so that solr already knows
what to do with that response? (that is, which field are to be treated as
payloads, which are tokens, etc...)


Check this issue: http://issues.apache.org/jira/browse/SOLR-1535


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Hit Rate


> Is there a good way to get the hit count of a search?
> 
> Example query:
> textField:solr AND documentId:1000
> 
> Say document with Id = 1000 has "solr" 13 times in the
> document. Any way to
> extract that number [13] in the response? 

Looks like you are looking for term frequency info:

Two separate solutions:
http://wiki.apache.org/solr/TermVectorComponent
http://wiki.apache.org/solr/FunctionQuery#tf

Re: Apache Nutch and Solr Integration

Can you let me know when and where you were getting the error? A screen-shot
will be helpful.

On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Hello Friends,
>
>
> I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
> . I did the steps explained in the following two URL's :
>
> http://wiki.apache.org/nutch/RunningNutchAndSolr
>
>
> http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
>
>
> I downloaded both the softwares, however, I am getting error (*solrUrl is
> not set, indexing will be skipped..*) when I am trying to crawl using
> Cygwin.
>
> Can anyone please help me out to fix this issue ?
> Else any other website suggesting for Apache Nutch and Solr integration
> would be greatly helpful.
>
>
>
> Thanks & Regards,
> Serenity
>

primary key made of multiple fields from multiple source tables

2011-07-05 Thread Mark juszczec

Hello all

I'm using Solr 3.2 and am trying to index a document whose primary key is
built from multiple columns selected from an Oracle DB.

I'm getting the following error:

java.lang.IllegalArgumentException: deltaQuery has no column to resolve to
declared primary key pk='ordersorderline_id'
at
org.apache.solr.handler.dataimport.DocBuilder.findMatchingPkColumn(DocBuilder.java:840)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:891)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:284)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:178)
~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:374)
[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:413)
[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)
[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30
23:09:08]


The deltaQuery is:

select orders.order_id || orders.order_booked_ind ||
order_line.order_line_id as ordersorderline_id, orders.order_id,
orders.order_booked_ind, order_line.order_line_id, orders.order_dt,
orders.cancel_dt, orders.account_manager_id, orders.of_header_id,
orders.order_status_lov_id, orders.order_type_id,
orders.approved_discount_pct, orders.campaign_nm,
orders.approved_by_cd,orders.advertiser_id, orders.agency_id,
order_line.accounting_comments_desc
from orders, order_line
where order_line.order_id = orders.order_id and order_line.order_booked_ind
= orders.order_booked_ind

I've just seen in the Solr Wiki Task List at
http://wiki.apache.org/solr/TaskList?highlight=%28composite%29 a Big Idea
for The Future is:

"support for *composite* keys ... either with some explicit change to the
 declaration or perhaps just copyField with some hidden magic
that concats the resulting terms into a single key Term"

Does this prohibit my creating the key with the select as above?

Mark

Re: Hit Rate

2011-07-05 Thread Briggs Thompson

Yes indeed, that is what I was missing. Thanks Ahmet!

On Tue, Jul 5, 2011 at 12:48 PM, Ahmet Arslan  wrote:

>
> > Is there a good way to get the hit count of a search?
> >
> > Example query:
> > textField:solr AND documentId:1000
> >
> > Say document with Id = 1000 has "solr" 13 times in the
> > document. Any way to
> > extract that number [13] in the response?
>
> Looks like you are looking for term frequency info:
>
> Two separate solutions:
> http://wiki.apache.org/solr/TermVectorComponent
> http://wiki.apache.org/solr/FunctionQuery#tf
>
>
>

Dynamic Facets

Hi, guys,

We have more than 1000 attributes scattered around 700K docs. Each doc might
have about 50 attributes. I would like Solr to return up to 20 facets for
every searches, and each search can return facets dynamically depending on
the matched docs. Anyone done that before? That'll be awesome if the facets
returned will be changed after we drill down facets.

I have looked at the following docs:
http://wiki.apache.org/solr/SimpleFacetParameters
http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr

Wondering what's the best way to accomplish that. Any advice?

Thanks,

YH

Re: After the query component has the results, can I do more filtering on them?

2011-07-05 Thread arian487

Sorry for being vague.  Okay so these scores exist on an external server and
they change often enough.  The score for each returned user is actually
dependent on the user doing the searching (if I'm making the request, and
you make the same request, the scores are different).  So what I'm doing is
getting a bunch of scores from the external and aggregating that with the
current scores solr gave in my component.  So heres the flow (all numbers
are arbitrary):

1) Get 10,000 results from solr from the query component
2) return a list of scores and ids from the external server (it'll return a
lot of them)
3) Out of this 1, I take the top 3500 docs after aggregating the
external servers scores and netcons scores.  

The problem is, the score for each doc is specific to the user making the
request.  The algorithm in doing these scores is quite complex.  I cannot
simply re-index with new scores, hence I've written this component which
runs after querycomponent and does the magic of filtering.  

I've come up with a solution but it involved me changing a lot of solr code. 
First and foremost, I've maed the queryResultCache public and developed a
small API in accessing and changing it.  I've also changed the
QueryResultKey to include a Long userId in its hashCode and equals
functions.  When a search is made, the QueryComponent caches its results,
and then in my custom component I go into that cache, get my superset,
filter it out from the scores in my external server, and throw it back into
cache.  Of course none of this happens if my custom scored stuff is already
cached, so its actually decent.  

If you have any suggestions and improvements I'd greatly appreciate it. 
Sorry for the long response...I didn't want to be an XY problem again :D

--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3141652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Apache Nutch and Solr Integration

2011-07-05 Thread Markus Jelsma

You are using the crawl job so you must specify the URL to your Solr instance.

The newly updated wiki has you answer:
http://wiki.apache.org/nutch/bin/nutch_crawl

> Hello Friends,
> 
> 
> I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2
> . I did the steps explained in the following two URL's :
> 
> http://wiki.apache.org/nutch/RunningNutchAndSolr
> 
> http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apach
> e-solr.html
> 
> 
> I downloaded both the softwares, however, I am getting error (*solrUrl is
> not set, indexing will be skipped..*) when I am trying to crawl using
> Cygwin.
> 
> Can anyone please help me out to fix this issue ?
> Else any other website suggesting for Apache Nutch and Solr integration
> would be greatly helpful.
> 
> 
> 
> Thanks & Regards,
> Serenity

Re: Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston

Please find attached screenshot

On Tue, Jul 5, 2011 at 11:53 AM, Way Cool  wrote:

> Can you let me know when and where you were getting the error? A
> screen-shot
> will be helpful.
>
> On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston <
> serenity.kenings...@gmail.com> wrote:
>
> > Hello Friends,
> >
> >
> > I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr
> 3.2
> > . I did the steps explained in the following two URL's :
> >
> > http://wiki.apache.org/nutch/RunningNutchAndSolr
> >
> >
> >
> http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
> >
> >
> > I downloaded both the softwares, however, I am getting error (*solrUrl is
> > not set, indexing will be skipped..*) when I am trying to crawl using
> > Cygwin.
> >
> > Can anyone please help me out to fix this issue ?
> > Else any other website suggesting for Apache Nutch and Solr integration
> > would be greatly helpful.
> >
> >
> >
> > Thanks & Regards,
> > Serenity
> >
>

Re: Custom Cache cleared after a commit?

2011-07-05 Thread arian487

Sorry for my ignorance, but do you have any lead in the code on where to look
for this?  Also, I'd still need a way of finding out how long its been in
the cache because I don't want it to regenerate every time.  I'd want it to
regenerate only if its been in the cache for less then 6 hours (or some time
frame which I deem to be good).  Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3141673.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dynamic Facets

2011-07-05 Thread Erik Hatcher

YH -

One technique (that the Smithsonian employs, I believe) is a technique to index 
the field names for the attributes into a separate field, facet on that first, 
and then facet on the fields you'd like from that response in a second request 
to Solr.

There's a basic hack here so the indexing client doesn't need to add the 
"fields used" field: 

Ideally, this could all be made part of one request to Solr - and I can 
envision a pre-faceting component (post querying) to dynamically figure out the 
best fields to facet on, set those into the request context, and the rest is 
magic.

Erik

On Jul 5, 2011, at 13:15 , Way Cool wrote:

> Hi, guys,
> 
> We have more than 1000 attributes scattered around 700K docs. Each doc might
> have about 50 attributes. I would like Solr to return up to 20 facets for
> every searches, and each search can return facets dynamically depending on
> the matched docs. Anyone done that before? That'll be awesome if the facets
> returned will be changed after we drill down facets.
> 
> I have looked at the following docs:
> http://wiki.apache.org/solr/SimpleFacetParameters
> http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
> 
> Wondering what's the best way to accomplish that. Any advice?
> 
> Thanks,
> 
> YH

Re: faceting on field with two values


: I have two fields TOWN and POSTALCODE and I want to concat those two in one
: field to do faceting

As others have pointed out, copy field doesn't do a "concat", it just 
adds the field values from the source field to the desc field (so with 
those two  lines you will typically get two values for each 
doc in the dest field)

if you don't wnat to go the DIH route, and you don't want to change your 
talend process, you could use a simple UpdateProcessor for this (update 
processors are used to process add/delete requests no matter what 
source the come from, before analysis happens) ... but i don't think we 
have any off the shelf "Concat" update processors in solr at the moment 

there is a patch for a a Script based on which might be helpful..
https://issues.apache.org/jira/browse/SOLR-1725

All of that said, based on what you've described about your usecase i 
would question from a UI standpoint wether this field would actually a 
good idea...

isn't there an extremely large number of postal codes even in a single 
city?

why not let people fact on just the town field first, and then only when 
they click on one, offer them a facet on Postal code?

Otherwise your facet UI is going to have a tendenzy to look like this...

 Gender:
   * Male  (9000 results)
   * Female  (8000 results)
 Town/Postal:
   * paris, 75016  (560 results)
   * paris, 75015  (490 results)
   * paris, 75022  (487 results)
   * boulogne sur mer 62200 (468 results)
   * paris, 75018  (465 results)
   * (click to see more)
 Color:
   * Red (900 results)
   * Blue (800 results)

...and many of your users will never find the town they are looking for 
(let alone the post code)


-Hoss

Cannot I search documents added by IndexWriter after commit?

@Test
public void testUpdate() throws IOException,
ParserConfigurationException, SAXException, ParseException {
Analyzer analyzer = getAnalyzer();
QueryParser parser = new QueryParser(Version.LUCENE_32, content,
analyzer);
Query allQ = parser.parse("*:*");

IndexWriter writer = getWriter();
IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer,
true));
TopDocs docs = searcher.search(allQ, 10);
*assertEquals(0, docs.totalHits); // empty/no index*

Document doc = getDoc();
writer.addDocument(doc);
writer.commit();

docs = searcher.search(allQ, 10);
*assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
equals 0*
}
What am I doing wrong here?

If I initialize searcher with new IndexSearcher(directory) I'm told:
org.apache.lucene.index.IndexNotFoundException: no segments* file found in
org.apache.lucene.store.RAMDirectory@3caa4blockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
files: []

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Dynamic Facets

2011-07-05 Thread darren


You can issue a new facet search as you drill down from your UI.
You have to specify the fields you want to facet on and they can be
dynamic.

Take a look at recent threads here on taxonomy faceting for help.
Also, look here[1]

[1] http://wiki.apache.org/solr/SimpleFacetParameters

On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool 
wrote:
> Hi, guys,
> 
> We have more than 1000 attributes scattered around 700K docs. Each doc
> might
> have about 50 attributes. I would like Solr to return up to 20 facets
for
> every searches, and each search can return facets dynamically depending
on
> the matched docs. Anyone done that before? That'll be awesome if the
facets
> returned will be changed after we drill down facets.
> 
> I have looked at the following docs:
> http://wiki.apache.org/solr/SimpleFacetParameters
>
http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
> 
> Wondering what's the best way to accomplish that. Any advice?
> 
> Thanks,
> 
> YH

Re: A beginner problem

: follow a receipe.  So I went to the the solr site, downloaded solr and
: tried to follow the tutorial.  In the  "example" folder of solr, using
: "java -jar start.jar " I got:
: 
: 2011-07-04 13:22:38.439:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
: 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
: 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983

if that is everything you got in the logs, then i suspect:
  a) you download a source release (ie: has "*-src-*" in it's name) in 
which the solr.war app has not yet been compiled)
  b) you did not run "ant example" to build solr and setup the example 
instance.

If i'm wrong, then yes please more details would be helpful: what exact 
URL did you download?

-Hoss

Re: Dynamic Facets

Thanks Erik and Darren.
A pre-faceting component (post querying) will be ideal as though maybe a
little performance penalty there. :-) I will try to implement one if no one
has done so.

Darren, I did look at the taxonomy faceting thread. My main concern is that
I want to have dynamic facets to be returned because I don't know what
facets I can specify as a part of query ahead of time, and there are too
many search terms. ;-)

Thanks for help.

On Tue, Jul 5, 2011 at 11:49 AM,  wrote:

>
> You can issue a new facet search as you drill down from your UI.
> You have to specify the fields you want to facet on and they can be
> dynamic.
>
> Take a look at recent threads here on taxonomy faceting for help.
> Also, look here[1]
>
> [1] http://wiki.apache.org/solr/SimpleFacetParameters
>
> On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool 
> wrote:
> > Hi, guys,
> >
> > We have more than 1000 attributes scattered around 700K docs. Each doc
> > might
> > have about 50 attributes. I would like Solr to return up to 20 facets
> for
> > every searches, and each search can return facets dynamically depending
> on
> > the matched docs. Anyone done that before? That'll be awesome if the
> facets
> > returned will be changed after we drill down facets.
> >
> > I have looked at the following docs:
> > http://wiki.apache.org/solr/SimpleFacetParameters
> >
>
> http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
> >
> > Wondering what's the best way to accomplish that. Any advice?
> >
> > Thanks,
> >
> > YH
>

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Michael McCandless

After your writer.commit you need to reopen your searcher to see the changes.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
 wrote:
>    @Test
>    public void testUpdate() throws IOException,
> ParserConfigurationException, SAXException, ParseException {
>        Analyzer analyzer = getAnalyzer();
>        QueryParser parser = new QueryParser(Version.LUCENE_32, content,
> analyzer);
>        Query allQ = parser.parse("*:*");
>
>        IndexWriter writer = getWriter();
>        IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer,
> true));
>        TopDocs docs = searcher.search(allQ, 10);
> *        assertEquals(0, docs.totalHits); // empty/no index*
>
>        Document doc = getDoc();
>        writer.addDocument(doc);
>        writer.commit();
>
>        docs = searcher.search(allQ, 10);
> *        assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
> equals 0*
>    }
> What am I doing wrong here?
>
> If I initialize searcher with new IndexSearcher(directory) I'm told:
> org.apache.lucene.index.IndexNotFoundException: no segments* file found in
> org.apache.lucene.store.RAMDirectory@3caa4blockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
> files: []
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Re: Apache Nutch and Solr Integration

Sorry, Serenity, somehow I don't see the attachment.

On Tue, Jul 5, 2011 at 11:23 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Please find attached screenshot
>
>
> On Tue, Jul 5, 2011 at 11:53 AM, Way Cool  wrote:
>
>> Can you let me know when and where you were getting the error? A
>> screen-shot
>> will be helpful.
>>
>> On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston <
>> serenity.kenings...@gmail.com> wrote:
>>
>> > Hello Friends,
>> >
>> >
>> > I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr
>> 3.2
>> > . I did the steps explained in the following two URL's :
>> >
>> > http://wiki.apache.org/nutch/RunningNutchAndSolr
>> >
>> >
>> >
>> http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
>> >
>> >
>> > I downloaded both the softwares, however, I am getting error (*solrUrl
>> is
>> > not set, indexing will be skipped..*) when I am trying to crawl using
>> > Cygwin.
>> >
>> > Can anyone please help me out to fix this issue ?
>> > Else any other website suggesting for Apache Nutch and Solr integration
>> > would be greatly helpful.
>> >
>> >
>> >
>> > Thanks & Regards,
>> > Serenity
>> >
>>
>
>

Re: Cannot I search documents added by IndexWriter after commit?

and how do you do that? There is no reopen method

On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> After your writer.commit you need to reopen your searcher to see the
> changes.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
>  wrote:
> >@Test
> >public void testUpdate() throws IOException,
> > ParserConfigurationException, SAXException, ParseException {
> >Analyzer analyzer = getAnalyzer();
> >QueryParser parser = new QueryParser(Version.LUCENE_32, content,
> > analyzer);
> >Query allQ = parser.parse("*:*");
> >
> >IndexWriter writer = getWriter();
> >IndexSearcher searcher = new
> IndexSearcher(IndexReader.open(writer,
> > true));
> >TopDocs docs = searcher.search(allQ, 10);
> > *assertEquals(0, docs.totalHits); // empty/no index*
> >
> >Document doc = getDoc();
> >writer.addDocument(doc);
> >writer.commit();
> >
> >docs = searcher.search(allQ, 10);
> > *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
> > equals 0*
> >}
> > What am I doing wrong here?
> >
> > If I initialize searcher with new IndexSearcher(directory) I'm told:
> > org.apache.lucene.index.IndexNotFoundException: no segments* file found
> in
> > org.apache.lucene.store.RAMDirectory@3caa4blockFactory
> =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
> > files: []
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
> >
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Michael McCandless

Sorry, you must reopen the underlying IndexReader, and then make a new
IndexSearcher from the reopened reader.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
 wrote:
> and how do you do that? There is no reopen method
>
> On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> After your writer.commit you need to reopen your searcher to see the
>> changes.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
>>  wrote:
>> >    @Test
>> >    public void testUpdate() throws IOException,
>> > ParserConfigurationException, SAXException, ParseException {
>> >        Analyzer analyzer = getAnalyzer();
>> >        QueryParser parser = new QueryParser(Version.LUCENE_32, content,
>> > analyzer);
>> >        Query allQ = parser.parse("*:*");
>> >
>> >        IndexWriter writer = getWriter();
>> >        IndexSearcher searcher = new
>> IndexSearcher(IndexReader.open(writer,
>> > true));
>> >        TopDocs docs = searcher.search(allQ, 10);
>> > *        assertEquals(0, docs.totalHits); // empty/no index*
>> >
>> >        Document doc = getDoc();
>> >        writer.addDocument(doc);
>> >        writer.commit();
>> >
>> >        docs = searcher.search(allQ, 10);
>> > *        assertEquals(1,docs.totalHits); //it fails here. docs.totalHits
>> > equals 0*
>> >    }
>> > What am I doing wrong here?
>> >
>> > If I initialize searcher with new IndexSearcher(directory) I'm told:
>> > org.apache.lucene.index.IndexNotFoundException: no segments* file found
>> in
>> > org.apache.lucene.store.RAMDirectory@3caa4blockFactory
>> =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
>> > files: []
>> >
>> > --
>> > Regards,
>> > K. Gabriele
>> >
>> > --- unchanged since 20/9/10 ---
>> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
>> > receipt within 48 hours then I don't resend the email.
>> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> time(x)
>> > < Now + 48h) ⇒ ¬resend(I, this).
>> >
>> > If an email is sent by a sender that is not a trusted contact or the
>> email
>> > does not contain a valid code then the email is not received. A valid
>> code
>> > starts with a hyphen and ends with "X".
>> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
>> > L(-[a-z]+[0-9]X)).
>> >
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

The OR operator in a query ?

2011-07-05 Thread duddy67

Hi all,

Someone could tell me what is the OR syntax in SOLR and how to use it in a
search query ?
I tried:  

fq=sometag:1+sometag:5
fq=sometag:[1+5]
fq=sometag:[1OR5]
fq=sometag:1+5

and many more but impossible to get what I want.


Thanks for advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Cannot I search documents added by IndexWriter after commit?

Still won't work (same as before).

 @Test
public void testUpdate() throws IOException,
ParserConfigurationException, SAXException, ParseException {
Analyzer analyzer = getAnalyzer();
QueryParser parser = new QueryParser(Version.LUCENE_32, content,
analyzer);
Query allQ = parser.parse("*:*");

IndexWriter writer = getWriter();
final IndexReader indexReader = IndexReader.open(writer, true);

IndexSearcher searcher = new IndexSearcher(indexReader);
TopDocs docs = searcher.search(allQ, 10);
assertEquals(0, docs.totalHits); // empty/no index

Document doc = getDoc();
writer.addDocument(doc);
writer.commit();

*indexReader.reopen();
searcher = new IndexSearcher(indexReader);
docs = searcher.search(allQ, 10);*
assertEquals(1,docs.totalHits);
}

  private Document getDoc() {
Document doc = new Document();
doc.add(new Field("id", "0", Field.Store.YES,
Field.Index.NOT_ANALYZED));
return doc;
}

 private IndexWriter getWriter() throws IOException {// 2
return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2
IndexWriter.MaxFieldLength.UNLIMITED); // 2
}

On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Sorry, you must reopen the underlying IndexReader, and then make a new
> IndexSearcher from the reopened reader.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
>  wrote:
> > and how do you do that? There is no reopen method
> >
> > On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> After your writer.commit you need to reopen your searcher to see the
> >> changes.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
> >>  wrote:
> >> >@Test
> >> >public void testUpdate() throws IOException,
> >> > ParserConfigurationException, SAXException, ParseException {
> >> >Analyzer analyzer = getAnalyzer();
> >> >QueryParser parser = new QueryParser(Version.LUCENE_32,
> content,
> >> > analyzer);
> >> >Query allQ = parser.parse("*:*");
> >> >
> >> >IndexWriter writer = getWriter();
> >> >IndexSearcher searcher = new
> >> IndexSearcher(IndexReader.open(writer,
> >> > true));
> >> >TopDocs docs = searcher.search(allQ, 10);
> >> > *assertEquals(0, docs.totalHits); // empty/no index*
> >> >
> >> >Document doc = getDoc();
> >> >writer.addDocument(doc);
> >> >writer.commit();
> >> >
> >> >docs = searcher.search(allQ, 10);
> >> > *assertEquals(1,docs.totalHits); //it fails here.
> docs.totalHits
> >> > equals 0*
> >> >}
> >> > What am I doing wrong here?
> >> >
> >> > If I initialize searcher with new IndexSearcher(directory) I'm told:
> >> > org.apache.lucene.index.IndexNotFoundException: no segments* file
> found
> >> in
> >> > org.apache.lucene.store.RAMDirectory@3caa4blockFactory
> >> =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
> >> > files: []
> >> >
> >> > --
> >> > Regards,
> >> > K. Gabriele
> >> >
> >> > --- unchanged since 20/9/10 ---
> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> >> > receipt within 48 hours then I don't resend the email.
> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> >> time(x)
> >> > < Now + 48h) ⇒ ¬resend(I, this).
> >> >
> >> > If an email is sent by a sender that is not a trusted contact or the
> >> email
> >> > does not contain a valid code then the email is not received. A valid
> >> code
> >> > starts with a hyphen and ends with "X".
> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y
> ∈
> >> > L(-[a-z]+[0-9]X)).
> >> >
> >>
> >
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
> >
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then

Re: The OR operator in a query ?

2011-07-05 Thread Juan Grande

Hi,

This two are valid and equivalent:

   - fq=sometag:1 OR sometag:5
   - fq=sometag:(1 OR 5)

Also, beware that fq defines a filter query, which is different from a
regular query (http://wiki.apache.org/solr/CommonQueryParameters#fq). For
more details on the query syntax see
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html

Regards,

*Juan*

On Tue, Jul 5, 2011 at 3:15 PM, duddy67  wrote:

> Hi all,
>
> Someone could tell me what is the OR syntax in SOLR and how to use it in a
> search query ?
> I tried:
>
> fq=sometag:1+sometag:5
> fq=sometag:[1+5]
> fq=sometag:[1OR5]
> fq=sometag:1+5
>
> and many more but impossible to get what I want.
>
>
> Thanks for advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141843.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: The OR operator in a query ?

2011-07-05 Thread duddy67

Thanks for your response. I'll check this. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141916.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Per Newgro


Thanks for your response.

Am 05.07.2011 13:53, schrieb Erick Erickson:

Let's see the results of adding&debugQuery=on to your URL. Are you getting
any documents back at all? If not, then your query isn't getting any
documents to group.
I didn't get any docs back. But they have been in the response (I saw 
them in debugger).
But the structure had changed so that DocumentBuilder didn't brought me 
any results (getBeans()).
I investigated a bit further and found out that i had to set the 
|*group_main param to true. 


Now is get results. So the answer seems to be yes :-).
*|


You haven't told us much about what you're trying to do, you might want to
review: http://wiki.apache.org/solr/UsingMailingLists

Sorry for that.


Best
Erick
On Jul 4, 2011 11:55 AM, "Per Newgro"  wrote:


Cheers
Per

Re: A beginner problem

You can follow the links below to setup Nutch and Solr:
http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html

http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
http://wiki.apache.org/nutch/RunningNutchAndSolr

Of course, more details will be helpful for troubleshooting your env issue.
:-)

Have fun!

On Tue, Jul 5, 2011 at 11:49 AM, Chris Hostetter
wrote:

> : follow a receipe.  So I went to the the solr site, downloaded solr and
> : tried to follow the tutorial.  In the  "example" folder of solr, using
> : "java -jar start.jar " I got:
> :
> : 2011-07-04 13:22:38.439:INFO::Logging to STDERR via
> org.mortbay.log.StdErrLog
> : 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
> : 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983
>
> if that is everything you got in the logs, then i suspect:
>  a) you download a source release (ie: has "*-src-*" in it's name) in
> which the solr.war app has not yet been compiled)
>  b) you did not run "ant example" to build solr and setup the example
> instance.
>
> If i'm wrong, then yes please more details would be helpful: what exact
> URL did you download?
>
> -Hoss
>

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Yonik Seeley

On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro  wrote:
> i've tried to add the params for group=true and group.field=myfield by using
> the SolrQuery.
> But the result is null. Do i have to configure something? In wiki part for
> field collapsing i couldn't
> find anything.

No specific (type-safe) support for grouping is in SolrJ currently.
But you should still have access to the complete generic solr response
via SolrJ regardless (i.e. use getResponse())

-Yonik
http://www.lucidimagination.com

Re: Cannot I search documents added by IndexWriter after commit?

Re-open doens't work, but open does.

@Test
public void testUpdate() throws IOException,
ParserConfigurationException, SAXException, ParseException {
Analyzer analyzer = getAnalyzer();
QueryParser parser = new QueryParser(Version.LUCENE_32, content,
analyzer);
Query allQ = parser.parse("*:*");

IndexWriter writer = getWriter();
final IndexReader indexReader = IndexReader.open(writer, true);

IndexSearcher searcher = new IndexSearcher(indexReader);
TopDocs docs = searcher.search(allQ, 10);
assertEquals(0, docs.totalHits); // empty/no index

Document doc = getDoc();
writer.addDocument(doc);
writer.commit();

searcher = new IndexSearcher(IndexReader.open(writer, true));//new
IndexSearcher(directory);
docs = searcher.search(allQ, 10);
assertEquals(1, docs.totalHits);
}

On Tue, Jul 5, 2011 at 8:23 PM, Gabriele Kahlout
wrote:

> Still won't work (same as before).
>
>
>  @Test
> public void testUpdate() throws IOException,
> ParserConfigurationException, SAXException, ParseException {
> Analyzer analyzer = getAnalyzer();
> QueryParser parser = new QueryParser(Version.LUCENE_32, content,
> analyzer);
> Query allQ = parser.parse("*:*");
>
> IndexWriter writer = getWriter();
> final IndexReader indexReader = IndexReader.open(writer, true);
>
> IndexSearcher searcher = new IndexSearcher(indexReader);
>
> TopDocs docs = searcher.search(allQ, 10);
> assertEquals(0, docs.totalHits); // empty/no index
>
> Document doc = getDoc();
> writer.addDocument(doc);
> writer.commit();
>
> *indexReader.reopen();
> searcher = new IndexSearcher(indexReader);
>
> docs = searcher.search(allQ, 10);
> *
> assertEquals(1,docs.totalHits);
> }
>
>   private Document getDoc() {
> Document doc = new Document();
> doc.add(new Field("id", "0", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
> return doc;
> }
>
>  private IndexWriter getWriter() throws IOException {// 2
> return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2
> IndexWriter.MaxFieldLength.UNLIMITED); // 2
>
> }
>
> On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Sorry, you must reopen the underlying IndexReader, and then make a new
>> IndexSearcher from the reopened reader.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
>>  wrote:
>> > and how do you do that? There is no reopen method
>> >
>> > On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> After your writer.commit you need to reopen your searcher to see the
>> >> changes.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
>> >>  wrote:
>> >> >@Test
>> >> >public void testUpdate() throws IOException,
>> >> > ParserConfigurationException, SAXException, ParseException {
>> >> >Analyzer analyzer = getAnalyzer();
>> >> >QueryParser parser = new QueryParser(Version.LUCENE_32,
>> content,
>> >> > analyzer);
>> >> >Query allQ = parser.parse("*:*");
>> >> >
>> >> >IndexWriter writer = getWriter();
>> >> >IndexSearcher searcher = new
>> >> IndexSearcher(IndexReader.open(writer,
>> >> > true));
>> >> >TopDocs docs = searcher.search(allQ, 10);
>> >> > *assertEquals(0, docs.totalHits); // empty/no index*
>> >> >
>> >> >Document doc = getDoc();
>> >> >writer.addDocument(doc);
>> >> >writer.commit();
>> >> >
>> >> >docs = searcher.search(allQ, 10);
>> >> > *assertEquals(1,docs.totalHits); //it fails here.
>> docs.totalHits
>> >> > equals 0*
>> >> >}
>> >> > What am I doing wrong here?
>> >> >
>> >> > If I initialize searcher with new IndexSearcher(directory) I'm told:
>> >> > org.apache.lucene.index.IndexNotFoundException: no segments* file
>> found
>> >> in
>> >> > org.apache.lucene.store.RAMDirectory@3caa4blockFactory
>> >> =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
>> >> > files: []
>> >> >
>> >> > --
>> >> > Regards,
>> >> > K. Gabriele
>> >> >
>> >> > --- unchanged since 20/9/10 ---
>> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges
>> the
>> >> > receipt within 48 hours then I don't resend the email.
>> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>> >> time(x)
>> >> > < Now + 48h) ⇒ ¬resend(I, this).
>> >> >
>> >> > If an email is sent by a sender that is not a trusted contact or the
>> >> email
>> >> > does not contain a valid code then the email is not received. A valid
>> >> code
>> >> > starts with a hyphen and ends with "X".
>> >> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subj

Re: Cannot I search documents added by IndexWriter after commit?

2011-07-05 Thread Robert Muir

re-open does work, but you cannot ignore its return value! see the
javadocs for an example.

On Tue, Jul 5, 2011 at 3:10 PM, Gabriele Kahlout
 wrote:
> Re-open doens't work, but open does.
>
> @Test
>    public void testUpdate() throws IOException,
> ParserConfigurationException, SAXException, ParseException {
>        Analyzer analyzer = getAnalyzer();
>        QueryParser parser = new QueryParser(Version.LUCENE_32, content,
> analyzer);
>        Query allQ = parser.parse("*:*");
>
>        IndexWriter writer = getWriter();
>        final IndexReader indexReader = IndexReader.open(writer, true);
>
>        IndexSearcher searcher = new IndexSearcher(indexReader);
>        TopDocs docs = searcher.search(allQ, 10);
>        assertEquals(0, docs.totalHits); // empty/no index
>
>        Document doc = getDoc();
>        writer.addDocument(doc);
>        writer.commit();
>
>        searcher = new IndexSearcher(IndexReader.open(writer, true));//new
> IndexSearcher(directory);
>        docs = searcher.search(allQ, 10);
>        assertEquals(1, docs.totalHits);
>    }
>
> On Tue, Jul 5, 2011 at 8:23 PM, Gabriele Kahlout
> wrote:
>
>> Still won't work (same as before).
>>
>>
>>  @Test
>>     public void testUpdate() throws IOException,
>> ParserConfigurationException, SAXException, ParseException {
>>         Analyzer analyzer = getAnalyzer();
>>         QueryParser parser = new QueryParser(Version.LUCENE_32, content,
>> analyzer);
>>         Query allQ = parser.parse("*:*");
>>
>>         IndexWriter writer = getWriter();
>>         final IndexReader indexReader = IndexReader.open(writer, true);
>>
>>         IndexSearcher searcher = new IndexSearcher(indexReader);
>>
>>         TopDocs docs = searcher.search(allQ, 10);
>>         assertEquals(0, docs.totalHits); // empty/no index
>>
>>         Document doc = getDoc();
>>         writer.addDocument(doc);
>>         writer.commit();
>>
>>     *    indexReader.reopen();
>>         searcher = new IndexSearcher(indexReader);
>>
>>         docs = searcher.search(allQ, 10);
>> *
>>         assertEquals(1,docs.totalHits);
>>     }
>>
>>   private Document getDoc() {
>>         Document doc = new Document();
>>         doc.add(new Field("id", "0", Field.Store.YES,
>> Field.Index.NOT_ANALYZED));
>>         return doc;
>>     }
>>
>>  private IndexWriter getWriter() throws IOException {            // 2
>>         return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2
>>                 IndexWriter.MaxFieldLength.UNLIMITED); // 2
>>
>>     }
>>
>> On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> Sorry, you must reopen the underlying IndexReader, and then make a new
>>> IndexSearcher from the reopened reader.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout
>>>  wrote:
>>> > and how do you do that? There is no reopen method
>>> >
>>> > On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless <
>>> > luc...@mikemccandless.com> wrote:
>>> >
>>> >> After your writer.commit you need to reopen your searcher to see the
>>> >> changes.
>>> >>
>>> >> Mike McCandless
>>> >>
>>> >> http://blog.mikemccandless.com
>>> >>
>>> >> On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout
>>> >>  wrote:
>>> >> >    @Test
>>> >> >    public void testUpdate() throws IOException,
>>> >> > ParserConfigurationException, SAXException, ParseException {
>>> >> >        Analyzer analyzer = getAnalyzer();
>>> >> >        QueryParser parser = new QueryParser(Version.LUCENE_32,
>>> content,
>>> >> > analyzer);
>>> >> >        Query allQ = parser.parse("*:*");
>>> >> >
>>> >> >        IndexWriter writer = getWriter();
>>> >> >        IndexSearcher searcher = new
>>> >> IndexSearcher(IndexReader.open(writer,
>>> >> > true));
>>> >> >        TopDocs docs = searcher.search(allQ, 10);
>>> >> > *        assertEquals(0, docs.totalHits); // empty/no index*
>>> >> >
>>> >> >        Document doc = getDoc();
>>> >> >        writer.addDocument(doc);
>>> >> >        writer.commit();
>>> >> >
>>> >> >        docs = searcher.search(allQ, 10);
>>> >> > *        assertEquals(1,docs.totalHits); //it fails here.
>>> docs.totalHits
>>> >> > equals 0*
>>> >> >    }
>>> >> > What am I doing wrong here?
>>> >> >
>>> >> > If I initialize searcher with new IndexSearcher(directory) I'm told:
>>> >> > org.apache.lucene.index.IndexNotFoundException: no segments* file
>>> found
>>> >> in
>>> >> > org.apache.lucene.store.RAMDirectory@3caa4blockFactory
>>> >> =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c:
>>> >> > files: []
>>> >> >
>>> >> > --
>>> >> > Regards,
>>> >> > K. Gabriele
>>> >> >
>>> >> > --- unchanged since 20/9/10 ---
>>> >> > P.S. If the subject contains "[LON]" or the addressee acknowledges
>>> the
>>> >> > receipt within 48 hours then I don't resend the email.
>>> >> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>>> >> time(x)
>>> >> > < Now + 48h) ⇒ ¬resend(I, this).

Re: Apache Nutch and Solr Integration

2011-07-05 Thread serenity keningston

On Tue, Jul 5, 2011 at 1:11 PM, Way Cool  wrote:

> Sorry, Serenity, somehow I don't see the attachment.
>
> On Tue, Jul 5, 2011 at 11:23 AM, serenity keningston <
> serenity.kenings...@gmail.com> wrote:
>
> > Please find attached screenshot
> >
> >
> > On Tue, Jul 5, 2011 at 11:53 AM, Way Cool 
> wrote:
> >
> >> Can you let me know when and where you were getting the error? A
> >> screen-shot
> >> will be helpful.
> >>
> >> On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston <
> >> serenity.kenings...@gmail.com> wrote:
> >>
> >> > Hello Friends,
> >> >
> >> >
> >> > I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and
> Solr
> >> 3.2
> >> > . I did the steps explained in the following two URL's :
> >> >
> >> > http://wiki.apache.org/nutch/RunningNutchAndSolr
> >> >
> >> >
> >> >
> >>
> http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html
> >> >
> >> >
> >> > I downloaded both the softwares, however, I am getting error (*solrUrl
> >> is
> >> > not set, indexing will be skipped..*) when I am trying to crawl using
> >> > Cygwin.
> >> >
> >> > Can anyone please help me out to fix this issue ?
> >> > Else any other website suggesting for Apache Nutch and Solr
> integration
> >> > would be greatly helpful.
> >> >
> >> >
> >> >
> >> > Thanks & Regards,
> >> > Serenity
> >> >
> >>
> >
> >
>

Re: Solr vs Hibernate Search (Huge number of DB DMLs)

2011-07-05 Thread fire fox

Please suggest..

On Mon, Jul 4, 2011 at 10:37 PM, fire fox  wrote:

> From my exploration so far, I understood that we can opt Solr straightaway
> if the index changes are kept to minimal. However, mine is absolutely the
> opposite. I'm still vague about the perfect solution for the scenario
> mentioned.
>
> Please share..
>
>
> On Mon, Jul 4, 2011 at 6:28 PM, fire fox  wrote:
>
>> Hi all,
>>There were several places I could find a discussion on this but I
>> failed to find the suited one for me.
>>
>> I'd like to be clear on my requirements, so that you may suggest me the
>> better solution.
>>
>> -> A project deals with tons of database tables (with *millions *of
>> records) out of which some are to be indexed which should be searchable
>> of-course. It uses Hibernate for MySQL transactions.
>>
>>  As per my knowledge, there could be two solutions to maintain sync
>> between index and database effectively.
>>
>> --> There'd be a *huge number of transactions (DMLs) on the DB*, so I'm
>> wondering which one of the following will be able to handle it
>> effectively.
>>
>>  1) Configure *Solr *server, query it to search / send events to update.
>> This might be better than handling Lucene solely which provides index
>> read/write and load balancing. The problem here could be to implement &
>> maintain sync between index and DB with no lag as the updations (DMLs on DB)
>> are very frequent. Too many events to be sent!
>>
>>  2) Using *Hibernate Search*. I'm just wondering about its 
>> *performance*considering high volume of transactions on DB every minute.
>>
>> Please suggest.
>>
>> Thanks in advance.
>>
>
>

Can I invert the inverted index?

Hello,

With an inverted index the term is the key, and the documents are the
values. Is it still however possible that given a document id I get the
terms indexed for that document?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Can I invert the inverted index?

2011-07-05 Thread lboutros

Hi Gabriele,

I'm not sure to understand your problem, but the TermVectorComponent may fit
your needs ?

http://wiki.apache.org/solr/TermVectorComponent
http://wiki.apache.org/solr/TermVectorComponentExampleEnabled

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: what s the optimum size of SOLR indexes

2011-07-05 Thread arian487

It depends on how many queries you'd be making per second.  I know for us, I
have a gradient of index sizes.  The first machine, which gets hit most
often is about 2.5 gigs.  Most of the queries would only ever need to hit
this index but then I have a bigger indices of about 5-10 gigs each which
are slower, but don't get queried as often so I can afford them to be a
little slower (and hence the bigger index)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/what-s-the-optimum-size-of-SOLR-indexes-tp3137314p3142309.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can I invert the inverted index?

I had looked an term vectors but don't understand them to solve my problem.
Consider the following index entries:

>From the 2nd entry we know that t1 is only present in doc0.
Now, my problem, given doc0 how can I know which terms occur in in (t0 and
t1) (without storing the content)?
One way is go over all terms in the index using the term dictionary.

On Tue, Jul 5, 2011 at 10:14 PM, lboutros  wrote:

> Hi Gabriele,
>
> I'm not sure to understand your problem, but the TermVectorComponent may
> fit
> your needs ?
>
> http://wiki.apache.org/solr/TermVectorComponent
> http://wiki.apache.org/solr/TermVectorComponentExampleEnabled
>
> Ludovic.
>
> -
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Can I invert the inverted index?

2011-07-05 Thread Rob Casson

sounds like the Luke request handler will get what you're after:

 http://wiki.apache.org/solr/LukeRequestHandler
 http://wiki.apache.org/solr/LukeRequestHandler#id

cheers,
rob

On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout
 wrote:
> Hello,
>
> With an inverted index the term is the key, and the documents are the
> values. Is it still however possible that given a document id I get the
> terms indexed for that document?
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Re: Can I invert the inverted index?

2011-07-05 Thread Erick Erickson

You can do this, kind of, but it's a lossy process. Consider indexing
"the cat in the hat strikes back", with "the", "in" being stopwords and
strikes getting stemmed to "strike". At very best, you can reconstruct
that the original doc contained "cat", "hat", "strike", "back". Is
that sufficient?

And it's a very expensive process.

What is the problem you're trying to solve? Perhaps there are other ways
to get what you need.

Best
Erick

On Tue, Jul 5, 2011 at 4:22 PM, Gabriele Kahlout
 wrote:
> I had looked an term vectors but don't understand them to solve my problem.
> Consider the following index entries:
>
> 
> 
>
> From the 2nd entry we know that t1 is only present in doc0.
> Now, my problem, given doc0 how can I know which terms occur in in (t0 and
> t1) (without storing the content)?
> One way is go over all terms in the index using the term dictionary.
>
>
> On Tue, Jul 5, 2011 at 10:14 PM, lboutros  wrote:
>
>> Hi Gabriele,
>>
>> I'm not sure to understand your problem, but the TermVectorComponent may
>> fit
>> your needs ?
>>
>> http://wiki.apache.org/solr/TermVectorComponent
>> http://wiki.apache.org/solr/TermVectorComponentExampleEnabled
>>
>> Ludovic.
>>
>> -
>> Jouve
>> France.
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Regards,
> K. Gabriele
>
> --- unchanged since 20/9/10 ---
> P.S. If the subject contains "[LON]" or the addressee acknowledges the
> receipt within 48 hours then I don't resend the email.
> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
> < Now + 48h) ⇒ ¬resend(I, this).
>
> If an email is sent by a sender that is not a trusted contact or the email
> does not contain a valid code then the email is not received. A valid code
> starts with a hyphen and ends with "X".
> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> L(-[a-z]+[0-9]X)).
>

Re: Using FieldCache in SolrIndexSearcher - crazy idea?


: Correct me if I am wrong:  In a standard distributed search with 
: QueryComponent, the first query sent to the shards asks for 
: fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being 
: generated to send back to the coordinator, SolrIndexSearcher.doc (int i, 
: Set fields) is called for each document.  As I understand it, 
: this will read each document from the index _on disk_ and retrieve the 
: myUniqueKey field value for each document.
: 
: My idea is to have a FieldCache for the myUniqueKey field in 
: SolrIndexSearcher (or somewhere else?) that would be used in cases where 
: the only field that needs to be retrieved is myUniqueKey.  Is this 
: something that would improve performance?

Quite probably ... you typically can't assume that a FieldCache can be 
constructed for *any* field, but it should be a safe assumption for the 
uniqueKey field, so for that initial request of the mutiphase distributed 
search it's quite possible it would speed things up.

if you want to try this and report back results, i'm sure a lot of people 
would be interested in a patch ... i would guess the best place to make 
the chance would be in the QueryComponent so thta it used the FieldCache 
(probably best to do it via getValueSource() on the uniqueKey's 
SchemaField) to put the ids in teh response instead of using a 
SolrDocList.

Hmm, actually...

there's no reason why this kind of optimization would need to be specific 
to distributed queries, it could be done by the ResponseWriters directly 
-- if the field list they are being asked to return only contains the 
uniqueKeyField and computed values (like score) then don't bother calling 
SolrIndexSearcher.doc at all ... the only hitch is that with distributed 
search and using function values as psuedo fields and what not there are 
more places calling SolrIndexSearcher.doc then their use to be ... so 
maybe putting this change directly into SolrIndexSearcher.doc would make 
the most sense?



-Hoss

Re: A beginner problem

2011-07-05 Thread carmmello


Thank you for your answer. I downloaded solr from the link you sugested
and now it is ok, I can see the administration page.  But it is strange
that a download from the solr site does not work. Tanks also to Way Cool.




> I don't know why, but it happened the same to me in the past (with
> 3.2). Apparently the zip I downloaded was not correct. I think you have to
> have a "solr.war" file on the "webapps" directory, do you have it?
> Do you know which version of Solr you downloaded?
> Download this one:
> http://apache.dattatec.com/lucene/solr/3.3.0/apache-solr-3.3.0.zip
> I just tried it and it's there.
>
> On Mon, Jul 4, 2011 at 1:49 PM,  wrote:
>
>> I use nutch, as a search engine.  Until now nutch did the crawl and the
>> search functions.  The newest version, however, delegated the search to
>> solr. I don't know almost nothing about programming, but i'm able to
>> follow a receipe.  So I went to the the solr site, downloaded solr and
>> tried to follow the tutorial.  In the  "example" folder of solr, using
>> "java -jar start.jar " I got:
>>
>> 2011-07-04 13:22:38.439:INFO::Logging to STDERR via
>> org.mortbay.log.StdErrLog
>> 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT
>> 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983
>>
>> When I tried  to go to http://localhost:8983/solr/admin/  I got:
>>
>> HTTP ERROR: 404
>> Problem accessing /solr/admin/. Reason:
>> NOT_FOUND
>>
>> Can someone help me with this?
>>
>> Tanks
>>
>>
>

Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

: Maybe what I really need is a query parser that does not do "disjunction
: maximum" at all, but somehow still combines different 'qf' type fields with
: different boosts on each field. I personally don't _neccesarily_ need the
: actual "disjunction max" calculation, but I do need combining of mutiple
: fields with different boosts. Of course, I'm not sure exactly how it would
: combine multiple fields if not "disjunction maximum", but perhaps one is
: conceivable that wouldn't be subject to this particular gotcha with differing
: analysis.

you can sort of do that today, something like this should work...

q = _query_:"$q1"^100 _query_:"$q2"^10 _query_:"$q3"^5 _query_:"$q4"
q1 = {!lucene df=title v=$qq}
q2 = {!lucene df=summary v=$qq}
q3 = {!lucene df=author v=$qq}
q4 = {!lucene df=body v=$qq}
qq = ...user input here...

..but you might want to replace "lucene" with "field" depending on what
metacharacters you want to support.

in general though the reason i wrote the dismax parser (instead of a
parser that works like this) is because of how a multiword queries wind up
matching/scoring. A guy named Chuck Williams wrote the earliest
versoin of the DisjunctionMaxQuery class and his "albino elephant"
example totally sold me on this approach back in 2005...

http://www.lucidimagination.com/search/document/8ce795c4b6752a1f/contribution_better_multi_field_searching
https://issues.apache.org/jira/browse/LUCENE-323

: I also remain kind of confused about how the existing dismax figures out "how
: many terms" for the 'mm' type calculations. If someone wanted to explain that,
: I would find it enlightening and helpful for understanding what's going on.

it's not really about terms -- it's just the total number of clauses in
the outer BooleanQuery that it builds. if a chunk of input produces a
valid DisjunctionMaxQuery (because the analyzer for at least one qf field
generated tokens) then that's a clause, if a chunk of input doesn't
produce a token (because none of hte analyzers from any of the qf ields
generated tokens) then that's not a clause.

-Hoss

Re: CopyField into another CopyField?


: In solr, is it possible to 'chain' copyfields so that you can copy the value
: of one into another?
...
: 
: 
: 
: Point being, every time I add a new field to the autocomplete, I want it to
: automatically also be added to ac_spellcheck without having to do it twice.

Sorry no, the IndexSchema won't recursively apply copyFields.  As i 
recall it was implemented this way partly for simplicity, and largly to 
protect people against the risk of infinite loops.  we could probably 
make a more sophisticated impl that detects infinite loops, but that check 
would slow things down and all solr could really do with it is throw an 
error.


-Hoss

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

2011-07-05 Thread Yonik Seeley

On Tue, Jul 5, 2011 at 5:13 PM, Chris Hostetter
 wrote:
> : Correct me if I am wrong:  In a standard distributed search with
> : QueryComponent, the first query sent to the shards asks for
> : fl=myUniqueKey or fl=myUniqueKey,score.  When the response is being
> : generated to send back to the coordinator, SolrIndexSearcher.doc (int i,
> : Set fields) is called for each document.  As I understand it,
> : this will read each document from the index _on disk_ and retrieve the
> : myUniqueKey field value for each document.
> :
> : My idea is to have a FieldCache for the myUniqueKey field in
> : SolrIndexSearcher (or somewhere else?) that would be used in cases where
> : the only field that needs to be retrieved is myUniqueKey.  Is this
> : something that would improve performance?
>
> Quite probably ... you typically can't assume that a FieldCache can be
> constructed for *any* field, but it should be a safe assumption for the
> uniqueKey field, so for that initial request of the mutiphase distributed
> search it's quite possible it would speed things up.

Ah, thanks Hoss - I had meant to respond to the original email, but
then I lost track of it.

Via pseudo-fields, we actually already have the ability to retrieve
values via FieldCache.
fl=id:{!func}id

But using CSF would probably be better here - no memory overhead for
the FieldCache entry.

-Yonik
http://www.lucidimagination.com



> if you want to try this and report back results, i'm sure a lot of people
> would be interested in a patch ... i would guess the best place to make
> the chance would be in the QueryComponent so thta it used the FieldCache
> (probably best to do it via getValueSource() on the uniqueKey's
> SchemaField) to put the ids in teh response instead of using a
> SolrDocList.
>
> Hmm, actually...
>
> there's no reason why this kind of optimization would need to be specific
> to distributed queries, it could be done by the ResponseWriters directly
> -- if the field list they are being asked to return only contains the
> uniqueKeyField and computed values (like score) then don't bother calling
> SolrIndexSearcher.doc at all ... the only hitch is that with distributed
> search and using function values as psuedo fields and what not there are
> more places calling SolrIndexSearcher.doc then their use to be ... so
> maybe putting this change directly into SolrIndexSearcher.doc would make
> the most sense?
>
>
>
> -Hoss
>

Re: Using FieldCache in SolrIndexSearcher - crazy idea?

2011-07-05 Thread Ryan McKinley

>
> Ah, thanks Hoss - I had meant to respond to the original email, but
> then I lost track of it.
>
> Via pseudo-fields, we actually already have the ability to retrieve
> values via FieldCache.
> fl=id:{!func}id
>
> But using CSF would probably be better here - no memory overhead for
> the FieldCache entry.
>

Not sure if this is related, but we should also consider using the
memory codec for id field
https://issues.apache.org/jira/browse/LUCENE-3209

Re: Is solrj 3.3.0 ready for field collapsing?

2011-07-05 Thread Ryan McKinley

patches are always welcome!


On Tue, Jul 5, 2011 at 3:04 PM, Yonik Seeley  wrote:
> On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro  wrote:
>> i've tried to add the params for group=true and group.field=myfield by using
>> the SolrQuery.
>> But the result is null. Do i have to configure something? In wiki part for
>> field collapsing i couldn't
>> find anything.
>
> No specific (type-safe) support for grouping is in SolrJ currently.
> But you should still have access to the complete generic solr response
> via SolrJ regardless (i.e. use getResponse())
>
> -Yonik
> http://www.lucidimagination.com
>

Re: configure dismax requesthandlar for boost a field