aliasing?

2011-05-09 Thread deniz
anyone knows about aliasing in Lucene/Solr? I need to implement something and
there is a title called on the task list as aliasing... I have few ideas
though but still not clear... 

anyone can explain that to me or refer some docs?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2917733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Total Documents Failed : How to find out why

2011-05-09 Thread Rohit
Hi,

I am running the solr index and post indexing I get these results, how can I
know which documents failed and why?

str name=Total Requests made to DataSource1/str
str name=Total Rows Fetched5170850/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2011-05-08 23:40:09/str
str name=Indexing completed. Added/Updated: 2972300 documents. Deleted 0
documents./str
str name=Committed2011-05-09 00:13:48/str
str name=Optimized2011-05-09 00:13:48/str
str name=Total Documents Processed2972300/str
str name=Total Documents Failed2198550/str
str name=Time taken 0:33:40.945/str

Running solr on jetty right now and the console shows no error, also 
\Solr\example\logs  folder is empty.

Thanks,
Rohit





tomcat and multicore processors

2011-05-09 Thread solr_beginner
Hi,
 
Is that possible that solr on tomcat on windows 2008 is using only one core of 
processor? Do I need configure something to use more cores? 
 
Best Regards,
Solr_Beginner

Re: How to Update Value of One Field of a Document in Index?

2011-05-09 Thread Luis Cappa Banda
Hello.

You should be able to get the current document that you want to update,
change your notes value with the new ones to be added bye the user, and then
make and update petition to Solr to delete the old document (findable by the
id that you include in the POST petition) and add the new document with the
changes done. Try to develop a small Java application with SolrJ resources,
for example. Depending on the number of update petitions that your
system/application will do I recommend you, or not, to include a commit
order after the update one. Also you can configure a periodic auto-commit to
update indexes automatically.


Searching accross Solr-Multicore

2011-05-09 Thread Benyahya, Fahd
Hallo everyone,

i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm
using the solr-admin GUI to get responses. The Problem is, that i get
results only from one core, but not from the others also. Each core has its
own schema.xml.

The Cores are like follow structured:

/multicore/solr/

  solr.xml

   1. core1
   - config
   * schema_1.xml
   - data
   2. core2
   3. core3

Any idea what could be the problem?

for all the help I am very appreciate

Fahd


Re: tomcat and multicore processors

2011-05-09 Thread deniz
yea you can use solr on tomcat, i am doing the same actually... but have no
idea about multiple cores tho...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/tomcat-and-multicore-processors-tp2917973p2918015.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching accross Solr-Multicore

2011-05-09 Thread Gora Mohanty
On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de wrote:
 Hallo everyone,

 i'm using solr-multicore with 3 cores to index my Web-Site. For testing i'm
 using the solr-admin GUI to get responses. The Problem is, that i get
 results only from one core, but not from the others also.
[...]

What do you mean by get results only from one core, but not from
the others also?
* Are you querying one core, and expecting to get results
  from all? This is not possible: You have to either query
  each, or merge them into a single core.
* Or, is it that queries are working on one core, and not on the
  other?

Regards,
Gora


Re: uima fieldMappings and solr dynamicField

2011-05-09 Thread Tommaso Teofili
Thanks Koji for opening that, the dynamicField mapping is a commonly used
feature especially for named entities mapping.
Tommaso

2011/5/7 Koji Sekiguchi k...@r.email.ne.jp

 I've opened https://issues.apache.org/jira/browse/SOLR-2503 .

 Koji
 --
 http://www.rondhuit.com/en/

 (11/05/06 20:15), Koji Sekiguchi wrote:
  Hello,
 
  I'd like to use dynamicField in feature-field mapping of uima update
  processor. It doesn't seem to be acceptable currently. Is it a bad idea
  in terms of use of uima? If it is not so bad, I'd like to try a patch.
 
  Background:
 
  Because my uima annotator can generate many types of named entity from
  a text, I don't want to implement so many types, but one type
 NamedEntity:
 
  typeSystemDescription
 types
   typeDescription
 namecom.rondhuit.uima.next.NamedEntity/name
 description/
 supertypeNameuima.tcas.Annotation/supertypeName
 features
   featureDescription
 namename/name
 description/
 rangeTypeNameuima.cas.String/rangeTypeName
   /featureDescription
   featureDescription
 nameentity/name
 description/
 rangeTypeNameuima.cas.String/rangeTypeName
   /featureDescription
 /features
   /typeDescription
 /types
  /typeSystemDescription
 
  sample extracted named entities:
 
  name=PERSON, entity=Barack Obama
  name=TITLE, entity=the President
 
  Now, I'd like to map these named entities to Solr fields like this:
 
  PERSON_S:Barack Obama
  TITLE_S:the President
 
  Because the type of name (PERSON, TITLE, etc.) can be so many,
  I'd like to use dynamicField *_s. And where * is replaced by the name
  feature of NamedEntity.
 
  I think this is natural requirement from Solr view point, but I'm
  not sure my uima annotator implementation is correct or not. In other
  words, should I implement many types for each entity types?
  (e.g. PersonEntity, TitleEntity, ... instead of NamedEntity)
 
  Thank you!
 
  Koji





Re: Searching accross Solr-Multicore

2011-05-09 Thread Benyahya, Fahd
Hi,

sorry that I did not so well explained my issue.

That is exactly as you described it(* Or, is it that queries are working on
one core, and not on the
 other?)

Regards,
Fahd

On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote:

 On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de
 wrote:
  Hallo everyone,
 
  i'm using solr-multicore with 3 cores to index my Web-Site. For testing
 i'm
  using the solr-admin GUI to get responses. The Problem is, that i get
  results only from one core, but not from the others also.
 [...]

 What do you mean by get results only from one core, but not from
 the others also?
 * Are you querying one core, and expecting to get results
  from all? This is not possible: You have to either query
  each, or merge them into a single core.
 * Or, is it that queries are working on one core, and not on the
  other?

 Regards,
 Gora



Re: Searching accross Solr-Multicore

2011-05-09 Thread rajini maski
If the schema is different across cores , you can query across the cores
only for those fields that are common.
Querying across all cores for some query paramterer and gettin result set in
one output xml can be achieved by shards

http://localhost:8090/solr1indent=onq=*:*shards=localhost:8090/solr1,localhost:8090/solr2rows=10start=0


Regards,
Rajani


On Mon, May 9, 2011 at 2:36 PM, Benyahya, Fahd fahd.benya...@netmoms.dewrote:

 Hi,

 sorry that I did not so well explained my issue.

 That is exactly as you described it(* Or, is it that queries are working on
 one core, and not on the
  other?)

 Regards,
 Fahd

 On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote:

  On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd fahd.benya...@netmoms.de
 
  wrote:
   Hallo everyone,
  
   i'm using solr-multicore with 3 cores to index my Web-Site. For testing
  i'm
   using the solr-admin GUI to get responses. The Problem is, that i get
   results only from one core, but not from the others also.
  [...]
 
  What do you mean by get results only from one core, but not from
  the others also?
  * Are you querying one core, and expecting to get results
   from all? This is not possible: You have to either query
   each, or merge them into a single core.
  * Or, is it that queries are working on one core, and not on the
   other?
 
  Regards,
  Gora
 



Solr 3.1 / Java 1.5: Exception regarding analyzer implementation

2011-05-09 Thread Martin Jansen
I just attempted to set up an instance of Solr 3.1 in Tomcat 5.5 running
in Java 1.5.  It fails with the following exception on start-up:

 java.lang.AssertionError: Analyzer implementation classes or at least their 
 tokenStream() and reusableTokenStream() implementations must be final at 
 org.apache.lucene.analysis.Analyzer.assertFinal(Analyzer.java:57)

The exact same configuration works like a charm on another machine with
Java 1.6 again using Tomcat 5.5.  Has anyone else run into this issue?
Is Solr 3.1 not compatible to Java 1.5 anymore?

The query analyzer where the exceptions seems to stem from looks like this:

   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory/
 
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1
 catenateWords=0
 catenateNumbers=0
 catenateAll=0
 splitOnCaseChange=1
 preserveOriginal=1 
 stemEnglishPossessive=0 
 splitOnNumerics=0 
 /
 
 filter class=solr.ShingleFilterFactory 
 minShingleSize=2 
 maxShingleSize=5 
 outputUnigrams=true
 /
   /analyzer

Best,
- Martin


Solr 1.3 highlighting problem

2011-05-09 Thread nicksnels1
Hi,

I'm using the old 1.3 Solr version on one of my sites and I decided to add
a highlighting feature. Unfortunately I can not get it to work. I'm doing
some testing in the Sorl admin interface without much luck. Below is some
information that describes the problem.

I would like to highlight text in the field text, schema.xml config of text:

field name=text type=string indexed=true stored=true/

Query in the solr admin interface:

http://127.0.0.1:8080/solr/select?indent=onversion=2.2q=solrstart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl=onhl.fl=text

I get back two results, both of the text fields contain the query solr. In
the highlight tag I get only the IDs:

lst name=highlightinglst name=54807/lst name=105235//lst

Any ideas what may be causing this and how I can debug it? Thanks.

Kind regards,

Nick


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-3-highlighting-problem-tp2918089p2918089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching accross Solr-Multicore

2011-05-09 Thread Benyahya, Fahd
thanks for all those who have answered my questions.
But i still not understanding, why i cannot  sent queries for each core own
and get results only form the core who has quired.
At first i'm not  intersting to get resultes for all cores in one xml output
. to do that i need to make a distributed searching.

Regards,

Fahd

On 9 May 2011 11:09, rajini maski rajinima...@gmail.com wrote:

 If the schema is different across cores , you can query across the cores
 only for those fields that are common.
 Querying across all cores for some query paramterer and gettin result set
 in
 one output xml can be achieved by shards


 http://localhost:8090/solr1indent=onq=*:*shards=localhost:8090/solr1,localhost:8090/solr2rows=10start=0


 Regards,
 Rajani


 On Mon, May 9, 2011 at 2:36 PM, Benyahya, Fahd fahd.benya...@netmoms.de
 wrote:

  Hi,
 
  sorry that I did not so well explained my issue.
 
  That is exactly as you described it(* Or, is it that queries are working
 on
  one core, and not on the
   other?)
 
  Regards,
  Fahd
 
  On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote:
 
   On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd 
 fahd.benya...@netmoms.de
  
   wrote:
Hallo everyone,
   
i'm using solr-multicore with 3 cores to index my Web-Site. For
 testing
   i'm
using the solr-admin GUI to get responses. The Problem is, that i get
results only from one core, but not from the others also.
   [...]
  
   What do you mean by get results only from one core, but not from
   the others also?
   * Are you querying one core, and expecting to get results
from all? This is not possible: You have to either query
each, or merge them into a single core.
   * Or, is it that queries are working on one core, and not on the
other?
  
   Regards,
   Gora
  
 



Re: Solr 3.1 / Java 1.5: Exception regarding analyzer implementation

2011-05-09 Thread Martin Jansen
On 09.05.11 11:04, Martin Jansen wrote:
 I just attempted to set up an instance of Solr 3.1 in Tomcat 5.5 running
 in Java 1.5.  It fails with the following exception on start-up:
 
 java.lang.AssertionError: Analyzer implementation classes or at least their 
 tokenStream() and reusableTokenStream() implementations must be final at 
 org.apache.lucene.analysis.Analyzer.assertFinal(Analyzer.java:57)

In the meantime I solved the issue by installing Java 1.6.  Works
without a problem now, but I'm wondering if Solr 3.1 is intentionally
incompatible to Java 1.5 or if if happened by mistake.

Martin


Faceting with MorelikeThis

2011-05-09 Thread Isha Garg

Hi All!

   Can Anybody tell me how to exclude the  count of  similar 
results( obtained from morelikethis ) from total facet count.



Thanks in Advance!
Isha Garg


Re: Solr 1.3 highlighting problem

2011-05-09 Thread Grijesh
Whether your field text is stored or not? Highlighting works with stored
fields of schema only.

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-3-highlighting-problem-tp2918089p2918299.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: aliasing?

2011-05-09 Thread Grijesh
Can you provide more detail about your required aliasing.

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2918303.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Whole unfiltered content in response document field

2011-05-09 Thread solrfan
I understand now. I become the raw content of the field because is stored.
The filtered content is in the response not visible. I can only see this in
the analysis view. Ok now :)

I will try to move the StopFilter under the WordDelimeter.


Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-unfiltered-content-in-response-document-field-tp2911588p2918316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Total Documents Failed : How to find out why

2011-05-09 Thread Erick Erickson
First you need to find your logs. That folder should not
be empty regardless of whether DIH is working correctly
or not.

I'm assuming here that you're just doing the java -jar star.jar
in the example directory, if this isn't the case how are you
starting Solr/Jetty?

Best
Erick

On Mon, May 9, 2011 at 3:26 AM, Rohit ro...@in-rev.com wrote:
 Hi,

 I am running the solr index and post indexing I get these results, how can I
 know which documents failed and why?

 str name=Total Requests made to DataSource1/str
 str name=Total Rows Fetched5170850/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2011-05-08 23:40:09/str
 str name=Indexing completed. Added/Updated: 2972300 documents. Deleted 0
 documents./str
 str name=Committed2011-05-09 00:13:48/str
 str name=Optimized2011-05-09 00:13:48/str
 str name=Total Documents Processed2972300/str
 str name=Total Documents Failed2198550/str
 str name=Time taken 0:33:40.945/str

 Running solr on jetty right now and the console shows no error, also 
 \Solr\example\logs  folder is empty.

 Thanks,
 Rohit






Re: Searching accross Solr-Multicore

2011-05-09 Thread Erick Erickson
There's not much information to go on here. Please review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, May 9, 2011 at 5:26 AM, Benyahya, Fahd fahd.benya...@netmoms.de wrote:
 thanks for all those who have answered my questions.
 But i still not understanding, why i cannot  sent queries for each core own
 and get results only form the core who has quired.
 At first i'm not  intersting to get resultes for all cores in one xml output
 . to do that i need to make a distributed searching.

 Regards,

 Fahd

 On 9 May 2011 11:09, rajini maski rajinima...@gmail.com wrote:

 If the schema is different across cores , you can query across the cores
 only for those fields that are common.
 Querying across all cores for some query paramterer and gettin result set
 in
 one output xml can be achieved by shards


 http://localhost:8090/solr1indent=onq=*:*shards=localhost:8090/solr1,localhost:8090/solr2rows=10start=0


 Regards,
 Rajani


 On Mon, May 9, 2011 at 2:36 PM, Benyahya, Fahd fahd.benya...@netmoms.de
 wrote:

  Hi,
 
  sorry that I did not so well explained my issue.
 
  That is exactly as you described it(* Or, is it that queries are working
 on
  one core, and not on the
   other?)
 
  Regards,
  Fahd
 
  On 9 May 2011 10:58, Gora Mohanty g...@mimirtech.com wrote:
 
   On Mon, May 9, 2011 at 2:10 PM, Benyahya, Fahd 
 fahd.benya...@netmoms.de
  
   wrote:
Hallo everyone,
   
i'm using solr-multicore with 3 cores to index my Web-Site. For
 testing
   i'm
using the solr-admin GUI to get responses. The Problem is, that i get
results only from one core, but not from the others also.
   [...]
  
   What do you mean by get results only from one core, but not from
   the others also?
   * Are you querying one core, and expecting to get results
    from all? This is not possible: You have to either query
    each, or merge them into a single core.
   * Or, is it that queries are working on one core, and not on the
    other?
  
   Regards,
   Gora
  
 




RE: Total Documents Failed : How to find out why

2011-05-09 Thread Rohit
Hi Erick,

Thats exactly how I am starting solr.

Regards,
Rohit

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 09 May 2011 16:57
To: solr-user@lucene.apache.org
Subject: Re: Total Documents Failed : How to find out why

First you need to find your logs. That folder should not
be empty regardless of whether DIH is working correctly
or not.

I'm assuming here that you're just doing the java -jar star.jar
in the example directory, if this isn't the case how are you
starting Solr/Jetty?

Best
Erick

On Mon, May 9, 2011 at 3:26 AM, Rohit ro...@in-rev.com wrote:
 Hi,

 I am running the solr index and post indexing I get these results, how can
I
 know which documents failed and why?

 str name=Total Requests made to DataSource1/str
 str name=Total Rows Fetched5170850/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2011-05-08 23:40:09/str
 str name=Indexing completed. Added/Updated: 2972300 documents. Deleted
0
 documents./str
 str name=Committed2011-05-09 00:13:48/str
 str name=Optimized2011-05-09 00:13:48/str
 str name=Total Documents Processed2972300/str
 str name=Total Documents Failed2198550/str
 str name=Time taken 0:33:40.945/str

 Running solr on jetty right now and the console shows no error, also 
 \Solr\example\logs  folder is empty.

 Thanks,
 Rohit







Custom filter development

2011-05-09 Thread solrfan
Hi, I would like to write my own filter. I try to use the following class:

public class MyFilter extends TokenFilter {

private String myField

public SemanticQueryExpansionFilter(TokenStream input, myFiled) 
{
super(input);
this.myField = myField;
}

@SuppressWarnings(deprecation)
public Token next() throws IOException
{
return parseToken(this.input.next());
}
 
@SuppressWarnings(deprecation)
public Token next(Token result) throws IOException
{
return parseToken(this.input.next());
}
 
protected Token parseToken(Token input)
{
/* do magic stuff with in.termBuffer() here (a char[] which can be
manipulated) */
/* set the changed length of the new term with in.setTermLength();
before returning it */
}
}

The factory and deploying is no problem, but I have a different question.

I want to trigger my filter at the last position after I have a clear set of
Tokens. This I can configure in my analyser XML-configuration.

My object from type MyFilter becomes in the constructor a input TokenStream.
I assume that this is a list of Tokens. The methods next use the
parseToken method. This is ok, the next Token from the input will be get
an a modified Token will be returned.

But this is a problem for me. The one-to-one mapping. I want to map a given
Token, for example a to three Tokens a1, a2, a3. I want to do a
one-to-one mapping to b - c too, and I want to have the possibility to
remove a Token d - .

How can I do this, when the next methods returns only one Token, not a
collection?


Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-filter-development-tp2918459p2918459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 1.3 highlighting problem

2011-05-09 Thread nicksnels1
Hi Grijesh,

The field text is stored and yet it is not working.

Kind regards,

Nick

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-3-highlighting-problem-tp2918089p2918518.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Use Solr / Lucene to search in a Logfile

2011-05-09 Thread Matthieu Huin

Hello Robert,

At my company, we are working on a generic log collector that uses Solr 
to provide search capabilities.

What the collector does basically is this (this is greatly dumbed down !) :

* collect a log line (read it from a file, receive it from the network, 
... )
* parse it through a set of regular expressions, searching for known log 
formats ( apache CLF, ... )
* if there is a match, store the results as a set of keys/values ( url : 
http://www.apache.org , source : XXX , raw_log : xx , ... )
* insert the set as a document in the Solr backend, using the REST 
interface.


Therefore I would advise you to adapt this workflow to suit your own 
needs : have a script looking for new lines in your log file, parse them 
in order to extract the relevant information you need, store the results 
as keys/values sets, then insert them into Solr via a http call.


My company's product is probably overkill for what you need to do, and 
we'd probably need to develop a specific log parser for your log format, 
but if you are willing to give it a try feel free to contact me !


Greetings,

Matthieu HUIN

On 06/05/2011 21:40, Robert Naczinski wrote:

Hi,

thanks for the reply. I did not know that.

Is there still a way to use Solr or Lucene? Or Apache Nutch would be not be bad.

Could I maybe write a customized DIH?

Greetings,

Robert

2011/5/6 Otis Gospodneticotis_gospodne...@yahoo.com:

Loggly.com


Re: uima fieldMappings and solr dynamicField

2011-05-09 Thread Koji Sekiguchi

Thanks Tommaso! I'm glad to hear from the person with experience.
I'll commit shortly.

Koji

(11/05/09 17:57), Tommaso Teofili wrote:

Thanks Koji for opening that, the dynamicField mapping is a commonly used
feature especially for named entities mapping.
Tommaso

2011/5/7 Koji Sekiguchik...@r.email.ne.jp


I've opened https://issues.apache.org/jira/browse/SOLR-2503 .

Koji
--
http://www.rondhuit.com/en/

(11/05/06 20:15), Koji Sekiguchi wrote:

Hello,

I'd like to use dynamicField in feature-field mapping of uima update
processor. It doesn't seem to be acceptable currently. Is it a bad idea
in terms of use of uima? If it is not so bad, I'd like to try a patch.

Background:

Because my uima annotator can generate many types of named entity from
a text, I don't want to implement so many types, but one type

NamedEntity:


typeSystemDescription
types
  typeDescription
namecom.rondhuit.uima.next.NamedEntity/name
description/
supertypeNameuima.tcas.Annotation/supertypeName
features
  featureDescription
namename/name
description/
rangeTypeNameuima.cas.String/rangeTypeName
  /featureDescription
  featureDescription
nameentity/name
description/
rangeTypeNameuima.cas.String/rangeTypeName
  /featureDescription
/features
  /typeDescription
/types
/typeSystemDescription

sample extracted named entities:

name=PERSON, entity=Barack Obama
name=TITLE, entity=the President

Now, I'd like to map these named entities to Solr fields like this:

PERSON_S:Barack Obama
TITLE_S:the President

Because the type of name (PERSON, TITLE, etc.) can be so many,
I'd like to use dynamicField *_s. And where * is replaced by the name
feature of NamedEntity.

I think this is natural requirement from Solr view point, but I'm
not sure my uima annotator implementation is correct or not. In other
words, should I implement many types for each entity types?
(e.g. PersonEntity, TitleEntity, ... instead of NamedEntity)

Thank you!

Koji









--
http://www.rondhuit.com/en/


Can ExtractingRequestHandler ignore documents metadata

2011-05-09 Thread Tod
I'm indexing content from a CMS' database of metadata.  The client would 
prefer that Solr exclude the properties (metadata) of any documents 
being indexed.  Is there a way to tell Tika to only index a document's 
text and not its properties?


Thanks - Tod


Solr Range Facets

2011-05-09 Thread Rohit
Hi Chris ,

 

I did try what you suggested, but I am not getting the expected results. The
code is given below,

 

 

SolrQuery query = new SolrQuery();

query.set(q,apple);

query.set(facet,true); 

   

query.set(facet.range, createdOnGMTDate);

query.set(facet.range.start, 2010-01-01T00:00:00Z) ;

query.set(facet.range.gap, +1DAY);



QueryResponse qr = server.query(query);



SolrDocumentList sdl = qr.getResults();



System.out.println(Found:  + sdl.getNumFound());

System.out.println(Start:  + sdl.getStart());

 

System.out.println(---);

   

ListFacetField facets = qr.getFacetFields();



for(FacetField facet : facets)

{

ListFacetField.Count facetEntries = facet.getValues();

 

for(FacetField.Count fcount : facetEntries)

{

System.out.println(fcount.getName() + :  +
fcount.getCount());

}

}   

 

Regards,

Rohit

 

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 07 May 2011 04:36
To: solr-user@lucene.apache.org
Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date
String:

 

 

: Thanks for the response, actually what we need to achive is see group by

: results based on dates like,

: 

: 2011-01-01  23

: 2011-01-02  14

: 2011-01-03  40

: 2011-01-04  10

: 

: Now the records in my table run into millions, grouping the result based
on

: UTC date would not produce the right result since the result should be

: grouped on users timezone.  Is there anyway we can achieve this in Solr?

 

Date faceting is entirely driven by query params, so if you index your 

events using the true time that they happend at (formatted as a string 

in UTC) you can then select your date ranges using whatever timezone 

offset is specified by your user at query time as a UTC offset.

 

  facet.range = dateField

  facet.range.start = 2011-01-01T00:00:00Z+${useroffset}MINUTES

  facet.range.gap = +1DAY

  etc...

 

 

-Hoss



Re: Replication Clarification Please

2011-05-09 Thread Ravi Solr
Hello Mr. Bell,
   Thank you very much for patiently responding to my
questions. We optimize once in every 2 days. Can you kindly rephrase
your answer, I could not understand - if the amount of time if  10
segments, I believe that might also trigger a whole index, since you
cycled all the segments.In that case I think you might want to
increase the mergeFactor.

The current index folder details and sizes are given below

MASTER
--
   5K   search-data/spellchecker2
 480M  search-data/index
   5K   search-data/spellchecker1
   5K   search-data/spellcheckerFile
 480M   search-data

SLAVE
--
   2K   search-data/index.20110509103950
 419M   search-data/index
 2.3G   search-data/index.20110429042508   SLAVE is pointing to
this directory
   5K   search-data/spellchecker1
   5K  search-data/spellchecker2
   5K   search-data/spellcheckerFile
 2.7G   search-data

Thanks,

Ravi Kiran Bhaskar

On Sat, May 7, 2011 at 11:49 PM, Bill Bell billnb...@gmail.com wrote:
 I did not see answers... I am not an authority, but will tell you what I
 think

 Did you get some answers?


 On 5/6/11 2:52 PM, Ravi Solr ravis...@gmail.com wrote:

Hello,
        Pardon me if this has been already answered somewhere and I
apologize for a lengthy post. I was wondering if anybody could help me
understand Replication internals a bit more. We have a single
master-slave setup (solr 1.4.1) with the configurations as shown
below. Our environment is quite commit heavy (almost 100s of docs
every 5 minutes), and all indexing is done on Master and all searches
go to the Slave. We are seeing that the slave replication performance
gradually decreases and the speed decreases  1kbps and ultimately
gets backed up. Once we reload the core on slave it will be work fine
for sometime and then it again gets backed up. We have mergeFactor set
to 10 and ramBufferSizeMB is set to 32MB and solr itself is running
with 2GB memory and locktype is simple on both master and slave.

 How big is your index? How many rows and GB ?

 Every time you replicate, there are several resets on caching. So if you
 are constantly
 Indexing, you need to be careful on how that performance impact will apply.


I am hoping that the following questions might help me understand the
replication performance issue better (Replication Configuration is
given at the end of the email)

1. Does the Slave get the whole index every time during replication or
just the delta since the last replication happened ?


 It depends. If you do an OPTIMIZE every time your index, then you will be
 sending the whole index down.
 If the amount of time if  10 segments, I believe that might also trigger
 a whole index, since you cycled all the segments.
 In that case I think you might want to increase the mergeFactor.



2. If there are huge number of queries being done on slave will it
affect the replication ? How can I improve the performance ? (see the
replications details at he bottom of the page)

 It seems that might be one way the you get the index.* directories. At
 least I see it more frequently when there is huge load and you are trying
 to replicate.
 You could replicate less frequently.


3. Will the segment names be same be same on master and slave after
replication ? I see that they are different. Is this correct ? If it
is correct how does the slave know what to fetch the next time i.e.
the delta.

 Yes they better be. In the old days you could just rsync the data
 directory from master and slave and reload the core, that worked fine.


4. When and why does the index.TIMESTAMP folder get created ? I see
this type of folder getting created only on slave and the slave
instance is pointing to it.

 I would love to know all the conditions... I believe it is supposed to
 replicate to index.*, then reload to point to it. But sometimes it gets
 stuck in index.* land and never goes back to straight index.

 There are several bug fixes for this in 3.1.


5. Does replication process copy both the index and index.TIMESTAMP
folder ?

 I believe it is supposed to copy the segment or whole index/ from master
 to index.* on slave.


6. what happens if the replication kicks off even before the previous
invocation has not completed ? will the 2nd invocation block or will
it go through causing more confusion ?

 That is not supposed to happen, if a replication is in process, it should
 not copy again until that one is complete.
 Try it, just delete the data/*, restart SOLR, and force a replication,
 while it is syncing, force it again. Does not seem to work for me.

7. If I have to prep a new master-slave combination is it OK to copy
the respective contents into the new master-slave and start solr ? or
do I have have to wipe the new slave and let it replicate from its new
master ?

 If you shut down the slave, copy the data/* directory amd restart you
 should be fine. That is how we fix the data/ dir when
 there is corruption.

8. Doing an 'ls | wc -l' on index folder of 

Solr 3.1 Upgrade - Reindex necessary ?

2011-05-09 Thread Ravi Solr
Hello All,
 I am planning to upgrade from Solr 1.4.1 to Solr 3.1. I
saw some deprecation warnings in the log as shown below

[#|2011-05-09T12:37:18.762-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13
;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|StopFilterFactory is
using deprecated LUCENE_24 emulation. You should at some point declare
and reindex to
at least 3.0, because 2.x emulation is deprecated and will be removed in 4.0|#]

[#|2011-05-09T12:37:18.765-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13
;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|WordDelimiterFilterFactory
is using deprecated LUCENE_24 emulation. You should at some point
declare and re
index to at least 3.0, because 2.x emulation is deprecated and will be
removed in 4.0|#]

[#|2011-05-09T12:37:18.767-0400|WARNING|sun-appserver9.1|org.apache.solr.analysis.BaseTokenStreamFactory|_ThreadID=53;_ThreadName=httpSSLWorkerThread-9001-13
;_RequestID=de32fd3f-e968-4228-a071-9bb175bfb549;|EnglishPorterFilterFactory
is using deprecated LUCENE_24 emulation. You should at some point
declare and re
index to at least 3.0, because 2.x emulation is deprecated and will be
removed in 4.0|#]


so I would love the experts advise on the following questions

1. Do we have to reindex all content again to use Solr 3.1 ?
2. If we don't reindex all content are there any potential issues ? (I
read somewhere that first commit would change the 1.4.1 format to 3.1.
have the analyzer's behavior changed which warrants reindexing ?)
3. Apart from deploying the new solr 3.1 war; Is it just enough to set
luceneMatchVersionLUCENE_31/luceneMatchVersion  to get all the
goodies and bug fixes of the LUCENE/SOLR 3.1 ?

Thank You,

Ravi Kiran Bhaskar


Re: Custom filter development

2011-05-09 Thread Tom Hill
On Mon, May 9, 2011 at 5:07 AM, solrfan a2701...@jnxjn.com wrote:
 Hi, I would like to write my own filter. I try to use the following class:
 But this is a problem for me. The one-to-one mapping. I want to map a given
 Token, for example a to three Tokens a1, a2, a3. I want to do a
 one-to-one mapping to b - c too, and I want to have the possibility to
 remove a Token d - .

 How can I do this, when the next methods returns only one Token, not a
 collection?

Buffer them internally. Look at SynonymFilter.java, it does exactly this.

Tom



 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Custom-filter-development-tp2918459p2918459.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Synonym Filter disable at query time

2011-05-09 Thread mtraynham
I would like to be able to disable the synonym filter during runtime based on
a query parameter, say 'synoynms=true' or 'synonyms=false'.

Is there a way within the AnaylzerQueryNodeProcessor or QParser that I can
remove the SynonymFilter from the AnalyzerAttributes?  

It seems that the Analyzer has a hashmap for it's 'analyzers' but I cannot
find the declaration of this item.

Am I going about this wrong is also another question I had...


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-tp2919876p2919876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr security

2011-05-09 Thread Brian Lamb
Hi all,

Is it possible to set up solr so that it will only execute dataimport
commands if they come from localhost?

Right now, my application and my solr installation are on different servers
so any requests are formatted http://domain:8983 instead of
http://localhost:8983. I am concerned that when I launch my application,
there will be the potential for abuse. Is the best solution to have
everything reside on the same server?

What are some other solutions?

Thanks,

Brian Lamb


Re: Solr security

2011-05-09 Thread Upayavira
Solr does not provide security (I believe Lucid EnterpriseWorks has
something there).

You should keep Solr itself secure behind a firewall, and pass all
requests through some intermediary that only allows sensible stuff
through to Solr itself. That way, the DataImportHandler is accessible
inside your firewall, and your search functionality is available
outside.

Upayavira

On Mon, 09 May 2011 14:57 -0400, Brian Lamb
brian.l...@journalexperts.com wrote:
 Hi all,
 
 Is it possible to set up solr so that it will only execute dataimport
 commands if they come from localhost?
 
 Right now, my application and my solr installation are on different
 servers
 so any requests are formatted http://domain:8983 instead of
 http://localhost:8983. I am concerned that when I launch my application,
 there will be the potential for abuse. Is the best solution to have
 everything reside on the same server?
 
 What are some other solutions?
 
 Thanks,
 
 Brian Lamb
 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Solr is not working for few document

2011-05-09 Thread Misra, Anil
Hi 

 

I am using solr with liferay till Friday everything was good but today I
added few documents but I am unable to search some of them L.

 

It is showing 0 result(s) are found.

 

Waiting for your help. 

 

Thanks,

 

anil misra

248-880-4948

 



Re: Solr 4.0

2011-05-09 Thread Gabriele Kahlout
REPOST as a more general question about ivy dependencies:
http://stackoverflow.com/questions/5941789/do-ivy-dependency-revisions-have-anything-to-do-with-svns


On Mon, May 9, 2011 at 11:31 AM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:

 I think you are talking about this dependency:

 dependency org=org.apache.solr name=solr-solrj *rev=1.4.1*
 conf=*-default /

 I've checked out solr 4 svn revision 1099940[1]. What value should I use
 for rev?

 [1]
 http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2905051.html


 On Tue, Apr 19, 2011 at 2:48 PM, Julien Nioche 
 lists.digitalpeb...@gmail.com wrote:

 You need to change the version of SOLR in ivy/ivy.xml then rebuild unless
 you change the jars straight in to nutch-1.3/runtime/local/lib - assuming
 that you're running Nutch locally only

 On 19 April 2011 07:09, Haspadar haspa...@gmail.com wrote:

  Yes, it occured after removing SolrJ1.4 jar and copy 4.0 version. Before
 it
  I upgrated Nutch for Solr 3.1 the same way and all worked fine.
 
  Thanks
 
  2011/4/19 Markus Jelsma markus.jel...@openindex.io
 
   Hi,
  
Hello.
I'm using Nutch 1.3. I decided to upgrade Solr to version 4.0 and I
replaced Nutch libs (Snapshot and SolrJ) from Solr dist. After that
 I
  got
the error at SolrIndexer on Reduce stage:
   
11/04/19 01:47:19 INFO mapred.JobClient:  map 100% reduce 27%
11/04/19 01:47:21 INFO mapred.JobClient: Task Id :
attempt_201104190142_0009_r_00_0, Status : FAILED
org.apache.solr.common.SolrException: ERROR: [doc=
 http://www.site.net/
  ]
Error adding field 'tstamp'='2011-04-18T22:45:17.404Z'
   
ERROR: [doc=http://www.site.net/] Error adding field
'tstamp'='2011-04-18T22:45:17.404Z'
   
request: http://127.0.0.1:8983/solr/update?wt=javabinversion=2
at
   
  
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp
SolrServer.java:436) at
   
  
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttp
SolrServer.java:245) at
   
  
 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstract
UpdateRequest.java:105) at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:50) at
org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:75)
 at
   
  
 
 org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.ja
va:48) at
   
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
  
   If you are using Solr  1.4.x then you must upgrade the SolrJ jar's in
   Nutch.
   Solr 1.4.x and higher are not compatible. Just remove the 1.4.x jar's
 and
   copy
   over the new.
  
   
I tried to remove tstamp from solrindex-mapping.xml and Solr's
   schema.xml.
But this field is required in schema.xml and I got the error:
   
11/04/19 01:58:03 INFO mapred.JobClient: Task Id :
attempt_201104190142_0010_r_00_0, Status : FAILED
org.apache.solr.common.SolrException: ERROR: [doc=
 http://www.site.net/
  ]
unknown field 'tstamp'
   
ERROR: [doc=http://www.site.net/] unknown field 'tstamp'
  
   Removing a mapping doesn't mean the field isn't copied over. All
 unmapped
   fields
   are copied as is. The example mapping seems rather useless as it
 copies
   exact
   field names. It's only useful if your source fields and destination
  fields
   are
   actually different, which is usually not the case if you dedicate a
 Solr
   core
   for a Nutch crawl.
  
   You must either not create the field by some plugin or add the field
 to
   your
   Solr index.
  
   I'm surprised this error actually showed up considering the
 incompatible
   Javabin versions. Perhaps you already upgraded the SolrJ api?
  
   
How I can upgrade Solr to 4 version?
   
Thank you.
  
 



 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is 

RE: Synonym Filter disable at query time

2011-05-09 Thread Robert Petersen
Just make another field using copyfield which the other field does not
apply synonyms to the text and then search either the one with or
without from the front end...  that will be your selector.  :)

-Original Message-
From: mtraynham [mailto:mtrayn...@digitalsmiths.com] 
Sent: Monday, May 09, 2011 11:17 AM
To: solr-user@lucene.apache.org
Subject: Synonym Filter disable at query time

I would like to be able to disable the synonym filter during runtime
based on
a query parameter, say 'synoynms=true' or 'synonyms=false'.

Is there a way within the AnaylzerQueryNodeProcessor or QParser that I
can
remove the SynonymFilter from the AnalyzerAttributes?  

It seems that the Analyzer has a hashmap for it's 'analyzers' but I
cannot
find the declaration of this item.

Am I going about this wrong is also another question I had...


--
View this message in context:
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-
tp2919876p2919876.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Synonym Filter disable at query time

2011-05-09 Thread mtraynham
Awesome thanks!  Also, you wouldn't happen to have any insight on boosting
synonyms lower than the original query after they were stemmed, would you?

Say if I had synonyms turned on:

The TokenStream is setup to do Synonyms - StopFilter - LowerCaseFilter -
SnowballPorter.

Say I search for Thomas, synonyms produces Thomas, Tom, Tommy.
The SnowballPorter produces Tom, Tommi, Thoma.

Is there a way to know Thoma would match the original term, so it could be
boosted higher?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-tp2919876p2920342.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Synonym Filter disable at query time

2011-05-09 Thread mtraynham
Actually now that I think about it, with copy fields I can just single out
the Synonym reader and boost from an earlier processor.

Thanks again though, that solved a lot of headache!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-tp2919876p2920510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr security

2011-05-09 Thread Jan Høydahl
Hi,

You can simply configure a firewall on your Solr server to only allow access 
from your frontend server. Whether you use the built-in software firewall of 
Linux/Windows/Whatever or use some other FW utility is a choice you need to 
make. This is by design - you should never ever expose your backend services, 
whether it's a search server or a database server, to the public.

Read more about Solr security on the WIKI: 
http://wiki.apache.org/solr/SolrSecurity

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 9. mai 2011, at 20.57, Brian Lamb wrote:

 Hi all,
 
 Is it possible to set up solr so that it will only execute dataimport
 commands if they come from localhost?
 
 Right now, my application and my solr installation are on different servers
 so any requests are formatted http://domain:8983 instead of
 http://localhost:8983. I am concerned that when I launch my application,
 there will be the potential for abuse. Is the best solution to have
 everything reside on the same server?
 
 What are some other solutions?
 
 Thanks,
 
 Brian Lamb



RE: Synonym Filter disable at query time

2011-05-09 Thread Robert Petersen
I was thinking search both and boost non-synonym field perhaps?

-Original Message-
From: mtraynham [mailto:mtrayn...@digitalsmiths.com] 
Sent: Monday, May 09, 2011 1:20 PM
To: solr-user@lucene.apache.org
Subject: RE: Synonym Filter disable at query time

Awesome thanks!  Also, you wouldn't happen to have any insight on
boosting
synonyms lower than the original query after they were stemmed, would
you?

Say if I had synonyms turned on:

The TokenStream is setup to do Synonyms - StopFilter - LowerCaseFilter
-
SnowballPorter.

Say I search for Thomas, synonyms produces Thomas, Tom, Tommy.
The SnowballPorter produces Tom, Tommi, Thoma.

Is there a way to know Thoma would match the original term, so it could
be
boosted higher?



--
View this message in context:
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-
tp2919876p2920342.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Synonym Filter disable at query time

2011-05-09 Thread Robert Petersen
Yay!   :)

-Original Message-
From: mtraynham [mailto:mtrayn...@digitalsmiths.com] 
Sent: Monday, May 09, 2011 1:59 PM
To: solr-user@lucene.apache.org
Subject: RE: Synonym Filter disable at query time

Actually now that I think about it, with copy fields I can just single
out
the Synonym reader and boost from an earlier processor.

Thanks again though, that solved a lot of headache!

--
View this message in context:
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-
tp2919876p2920510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Slow, CPU-bound commit

2011-05-09 Thread Santiago Bazerque
Hello,

I am using the new SOLR 3.1 for a 2.6 Gb, 1MM documents index. Reading the
forums and the archive I learned that SOLR and Lucene now manage commits and
transactions a bit differently than in previous versions, and indeed I feel
the behavior has changed.

Here's the thing: committing a few 100s documents is consistently taking
about 12 minutes of pure CPU fury on a 6168 AMD Opteron processor.

Here is a log of the commit (taken from INFOSTREAM.txt):

http://pastebin.com/1rFK3Fs1

These numbers improved significantly after we increased ramBufferSizeMB to
1024, here is the full solrconfig.xml:

http://pastebin.com/M1Tw0ATe

Does these numbers look normal to you? The index is being used for searching
while the commit takes place (about 1 search per second).

Thanks in advance,
Santiago


Re: edismax available in solr 3.1?

2011-05-09 Thread cyang2010
Is it a formal feature that solr 3.1 support?  Or still as experimental
feature?  If it is experimental feature, i would still be hesitating to use
it.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-available-in-solr-3-1-tp2910613p2920975.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: aliasing?

2011-05-09 Thread deniz
well... if i knew what to do about aliasing, i wouldnt post my question here
Grijesh :) My idea is this: for some search queries, I need to provide some
synonyms...
But it is just an idea...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2921305.html
Sent from the Solr - User mailing list archive at Nabble.com.


A DB dataSource and a URL Data source for Solr

2011-05-09 Thread deniz
I am trying to use two different data sources as in the title. The problem is
that it fails each time i try... I tried the stuff on SolrWiki but it
failed. 

anyone knows how to configure solr for using two different types of sources,
DB and URL?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-DB-dataSource-and-a-URL-Data-source-for-Solr-tp2921328p2921328.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: A DB dataSource and a URL Data source for Solr

2011-05-09 Thread deniz
well second time i have fixed the issue on my own after posting here 

but i dont understand why indexing time increased to 16 mins, while it was
only 2 mins with only db source... confused 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-DB-dataSource-and-a-URL-Data-source-for-Solr-tp2921328p2921444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr approaches to re-indexing large document corpus

2011-05-09 Thread George P. Stathis
We are looking for some recommendations around systematically re-indexing in
Solr an ever growing corpus of documents (tens of millions now, hundreds of
millions in than a year) without taking the currently running index down.
Re-indexing is needed on a periodic bases because:

   - New features are introduced around searching the existing corpus that
   require additional schema fields which we can't always anticipate in advance
   - The corpus is indexed across multiple shards. When it grows past a
   certain threshold, we need to create more shards and re-balance documents
   evenly across all of them (which SolrCloud does not seem to yet support).

The current index receives very frequent updates and additions, which need
to be available for search within minutes. Therefore, approaches where the
corpus is re-indexed in batch offline don't really work as by the time the
batch is finished, new documents will have been made available.

The approaches we are looking into at the moment are:

   - Create a new cluster of shards and batch re-index there while the old
   cluster is still available for searching. New documents that are not part of
   the re-indexed batch are sent to both the old cluster and the new cluster.
   When ready to switch, point the load balancer to the new cluster.
   - Use CoreAdmin: spawn a new core per shard and send the re-indexed batch
   to the new cores. New documents that are not part of the re-indexed batch
   are sent to both the old cores and the new cores. When ready to switch, use
   CoreAdmin to dynamically swap cores.

We'd appreciate if folks can either confirm or poke holes in either or all
these approaches. Is one more appropriate than the other? Or are we
completely off? Thank you in advance.


SolrQuery API for adding group filter

2011-05-09 Thread arian487
There doesn't seem to be API to add a group (like group.field or group=true). 
I'm very new to this so I'm wondering how I'd go about adding a group query
much like how I use 'addFilterQuery' to add an fq.  Thanks.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrQuery-API-for-adding-group-filter-tp2921539p2921539.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: aliasing?

2011-05-09 Thread Rob Casson
a lot of this stuff is covered in the tutorial, and expanded in the
wiki. still the best places to start in figuring out the fundamentals:

 http://lucene.apache.org/solr/tutorial.html
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

hth,
rc

On Mon, May 9, 2011 at 9:09 PM, deniz denizdurmu...@gmail.com wrote:
 well... if i knew what to do about aliasing, i wouldnt post my question here
 Grijesh :) My idea is this: for some search queries, I need to provide some
 synonyms...
 But it is just an idea...

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/aliasing-tp2917733p2921305.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: A DB dataSource and a URL Data source for Solr

2011-05-09 Thread Gora Mohanty
On Tue, May 10, 2011 at 7:24 AM, deniz denizdurmu...@gmail.com wrote:
 well second time i have fixed the issue on my own after posting here

 but i dont understand why indexing time increased to 16 mins, while it was
 only 2 mins with only db source... confused
[...]

Try timing just the URLDataSource. We had the same experience,
and for us it had to do with the need to read many small files from
the filesystem, something which was slower than a SELECT from a
database.

Regards,
Gora


Re: Solr is not working for few document

2011-05-09 Thread Gora Mohanty
On Tue, May 10, 2011 at 12:38 AM, Misra, Anil extern.anil.mi...@vw.com wrote:
[...]
 I am using solr with liferay till Friday everything was good but today I
 added few documents but I am unable to search some of them L.
[...]

We are not mind-readers, so it is hard to tell what went wrong
without any details from your side. Try going through this
document: http://wiki.apache.org/solr/UsingMailingLists

Looking into my crystal ball, one possibility is that you did not
do a commit after indexing. However, please go through the
above document before haring off after this solution.

Regards