Documentation request

2012-03-07 Thread Lance Norskog
Would someone please document that the PathHierarchyTokenizer has a
matching ReversePHT which is created by the PHT Factory if you say
reverse="true"? This took some doing to discover: I wanted to add a
factory for ReversePHT and discovered it was not needed. (My wiki
privileges are somehow broken.)

Thanks!

-- 
Lance Norskog
goks...@gmail.com


two solr instances using one index

2012-03-07 Thread C.Yunqin
Hi, everyone


  2 solr server nodes point  to  the same data directory (same index).  did 
the two solr instances work independently ? 
i found it was strange : one node (node0) can do complex search(for 
example:q:"disease"&sort=dateCreated), but the other(node1) using the same 
search reported out o f memory.  (the java -Xmx4G  is enough)

and i tried to start node1 first after we kill  node0  (if i kept node0 running 
, i can never start node1 without heap size error! Which will impact the 
performance of node1 to perform complex search) , any complex search can 
complete well.

did anybody meet the problem ever and any ideal about it ? ps: my solr version 
is 1.3

Re: schema design help

2012-03-07 Thread Gora Mohanty
On 8 March 2012 11:05, Abhishek tiwari  wrote:
> Gora,
> we are not having the related search ...
> like u have mentioned ... * will a search on an Establishment
> also require results from Movie, such as what movies are showing
> at the establishment*
>
> Establishment doesnot require movie reults .. each enitity has there
> separate search..
[...]

In that case, multiple cores should be OK.

Regards,
Gora


Re: schema design help

2012-03-07 Thread Abhishek tiwari
Gora,
we are not having the related search ...
like u have mentioned ... * will a search on an Establishment
also require results from Movie, such as what movies are showing
at the establishment*

Establishment doesnot require movie reults .. each enitity has there
separate search..

On Thu, Mar 8, 2012 at 10:49 AM, Gora Mohanty  wrote:

> On 8 March 2012 10:40, Abhishek tiwari 
> wrote:
> >  my page have layout in following manner
> >  *All tab* :which will contain all entities (Establishment/Event/Movie)
> > Establishment: contain Establishment search results
> > Event tab : will contain Event search results
> > Movie tab : will contain Movie search results
> >
> > please suggest me how to design my schema ?
> [...]
>
> You will need to think more about your search requirements, and
> provide more details. E.g., will a search on an Establishment
> also require results from Movie, such as what movies are showing
> at the establishment? Similarly, will results from an Event search
> require a list of Movies showing at the events? As Solr is not a
> RDBMS, if you need such correlated data, you should typically use
> a single, flat index, rather than multiple cores.
>
> IMHO, a multi-core setup would be unusual for what you are
> trying to do. However, this is difficult to say for sure without an
> insight into your search requirements.
>
> Regards,
> Gora
>


Re: in solr how to support Document.SetBoost as lucene?

2012-03-07 Thread Tommaso Teofili
when indexing a Solr document by sending XML files via HTTP POST you can
set it adding the boost element to the doc one, see
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_on_.22doc.22
If you plan to index using the java APIs (SolrJ, see
http://wiki.apache.org/solr/Solrj) you can do it with:
  SolrInputDocument doc = new SolrInputDocument();
  doc.setDocumentBoost(floatValue);

Hope this helps,
Tommaso

2012/3/8 James 

>  Hi gurus,
> In lucene, there is an API (Document.SetBoost) that can increase some
> specific doc's score. Is there some way we can do it in solr?
> Thanks.
>
>
>


Re: schema design help

2012-03-07 Thread Gora Mohanty
On 8 March 2012 10:40, Abhishek tiwari  wrote:
>  my page have layout in following manner
>  *All tab* :which will contain all entities (Establishment/Event/Movie)
> Establishment: contain Establishment search results
> Event tab : will contain Event search results
> Movie tab : will contain Movie search results
>
> please suggest me how to design my schema ?
[...]

You will need to think more about your search requirements, and
provide more details. E.g., will a search on an Establishment
also require results from Movie, such as what movies are showing
at the establishment? Similarly, will results from an Event search
require a list of Movies showing at the events? As Solr is not a
RDBMS, if you need such correlated data, you should typically use
a single, flat index, rather than multiple cores.

IMHO, a multi-core setup would be unusual for what you are
trying to do. However, this is difficult to say for sure without an
insight into your search requirements.

Regards,
Gora


Re: schema design help

2012-03-07 Thread Abhishek tiwari
 my page have layout in following manner
 *All tab* :which will contain all entities (Establishment/Event/Movie)
Establishment: contain Establishment search results
Event tab : will contain Event search results
Movie tab : will contain Movie search results

please suggest me how to design my schema ?


On Thu, Mar 8, 2012 at 10:21 AM, Walter Underwood wrote:

> You should create multiple cores when each core is an independent search.
> If you have three separate search pages, you may want three separate cores.
>
> wunder
> Search Guy, Chegg.com
>
> On Mar 7, 2012, at 8:48 PM, Abhishek tiwari wrote:
>
> > please  suggest me when one should  create multiple core..?
> >
> > On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood  >wrote:
> >
> >> Solr is not relational, so you will probably need to take a fresh look
> at
> >> your data.
> >>
> >> Here is one method.
> >>
> >> 1. Sketch your search results page.
> >> 2. Each result is a document in Solr.
> >> 3. Each displayed item is a stored field in Solr.
> >> 4. Each searched item is an indexed field in Solr.
> >>
> >> It may help to think of this as a big flat materialized view in your
> DBMS.
> >>
> >> wunder
> >> Search Guy, Chegg.com
> >>
> >> On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote:
> >>
> >>> thanks for replying ..
> >>>
> >>> In our RDBMS schema we have Establishment/Event/Movie master relations.
> >>> Establishment has title ,description , ratings,  tags, cuisines
> >>> (multivalued), services (multivalued) and features  (multivalued) like
> >>> fields..similarly in Event title, description, category(multivalued)
>  and
> >>> venue(multivalued) ..fields..and in movies name,start date and end date
> >>> ,genre, theater ,rating , review  like fields ..
> >>>
> >>> we are having nearly 1 M data in each entity and movie and event expire
> >>> frequently and we have to update on expire 
> >>> we are having the data additional to index data ( stored data)  to
> reduce
> >>> RDBMS query..
> >>>
> >>> please suggest me how to proceed for schema design.. single core or
> >>> multiple core for each entity?
> >>>
> >>>
> >>> On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty 
> wrote:
> >>>
>  On 6 March 2012 18:01, Abhishek tiwari  >
>  wrote:
> > i am new in solr  want help in shema design .  i have multiple
> entities
> > like Event , Establishments and Movies ..each have different types of
> > relations.. should i make diffrent core for each entities ?
> 
>  It depends on your use case, i.e., what would your typical searches
>  be on. Normally, using a separate core for each entity would be
>  unusual, and instead one would flatten out typical RDBMS data for
>  Solr.
> 
>  Please describe what you want to achieve, and people might be
>  better able to help you.
> 
>  Regards,
>  Gora
> 
>
>
>
>
>


Re: schema design help

2012-03-07 Thread Walter Underwood
You should create multiple cores when each core is an independent search. If 
you have three separate search pages, you may want three separate cores.

wunder
Search Guy, Chegg.com

On Mar 7, 2012, at 8:48 PM, Abhishek tiwari wrote:

> please  suggest me when one should  create multiple core..?
> 
> On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood 
> wrote:
> 
>> Solr is not relational, so you will probably need to take a fresh look at
>> your data.
>> 
>> Here is one method.
>> 
>> 1. Sketch your search results page.
>> 2. Each result is a document in Solr.
>> 3. Each displayed item is a stored field in Solr.
>> 4. Each searched item is an indexed field in Solr.
>> 
>> It may help to think of this as a big flat materialized view in your DBMS.
>> 
>> wunder
>> Search Guy, Chegg.com
>> 
>> On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote:
>> 
>>> thanks for replying ..
>>> 
>>> In our RDBMS schema we have Establishment/Event/Movie master relations.
>>> Establishment has title ,description , ratings,  tags, cuisines
>>> (multivalued), services (multivalued) and features  (multivalued) like
>>> fields..similarly in Event title, description, category(multivalued)  and
>>> venue(multivalued) ..fields..and in movies name,start date and end date
>>> ,genre, theater ,rating , review  like fields ..
>>> 
>>> we are having nearly 1 M data in each entity and movie and event expire
>>> frequently and we have to update on expire 
>>> we are having the data additional to index data ( stored data)  to reduce
>>> RDBMS query..
>>> 
>>> please suggest me how to proceed for schema design.. single core or
>>> multiple core for each entity?
>>> 
>>> 
>>> On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty  wrote:
>>> 
 On 6 March 2012 18:01, Abhishek tiwari 
 wrote:
> i am new in solr  want help in shema design .  i have multiple entities
> like Event , Establishments and Movies ..each have different types of
> relations.. should i make diffrent core for each entities ?
 
 It depends on your use case, i.e., what would your typical searches
 be on. Normally, using a separate core for each entity would be
 unusual, and instead one would flatten out typical RDBMS data for
 Solr.
 
 Please describe what you want to achieve, and people might be
 better able to help you.
 
 Regards,
 Gora
 






Re: schema design help

2012-03-07 Thread Abhishek tiwari
please  suggest me when one should  create multiple core..?

On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood wrote:

> Solr is not relational, so you will probably need to take a fresh look at
> your data.
>
> Here is one method.
>
> 1. Sketch your search results page.
> 2. Each result is a document in Solr.
> 3. Each displayed item is a stored field in Solr.
> 4. Each searched item is an indexed field in Solr.
>
> It may help to think of this as a big flat materialized view in your DBMS.
>
> wunder
> Search Guy, Chegg.com
>
> On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote:
>
> > thanks for replying ..
> >
> > In our RDBMS schema we have Establishment/Event/Movie master relations.
> > Establishment has title ,description , ratings,  tags, cuisines
> > (multivalued), services (multivalued) and features  (multivalued) like
> > fields..similarly in Event title, description, category(multivalued)  and
> > venue(multivalued) ..fields..and in movies name,start date and end date
> > ,genre, theater ,rating , review  like fields ..
> >
> >  we are having nearly 1 M data in each entity and movie and event expire
> > frequently and we have to update on expire 
> > we are having the data additional to index data ( stored data)  to reduce
> > RDBMS query..
> >
> > please suggest me how to proceed for schema design.. single core or
> > multiple core for each entity?
> >
> >
> > On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty  wrote:
> >
> >> On 6 March 2012 18:01, Abhishek tiwari 
> >> wrote:
> >>> i am new in solr  want help in shema design .  i have multiple entities
> >>> like Event , Establishments and Movies ..each have different types of
> >>> relations.. should i make diffrent core for each entities ?
> >>
> >> It depends on your use case, i.e., what would your typical searches
> >> be on. Normally, using a separate core for each entity would be
> >> unusual, and instead one would flatten out typical RDBMS data for
> >> Solr.
> >>
> >> Please describe what you want to achieve, and people might be
> >> better able to help you.
> >>
> >> Regards,
> >> Gora
> >>
>
>
>
>
>
>


bbox and geofilt question

2012-03-07 Thread William Bell
Here is a performance question for you...

I want to be able to return results < 160 km from Denver, CO. We have
run some performance numbers and we know what
doing bbox is MUCH faster than geofilt.

However we want to order the queries and run bbox AND then run geofilt
on those results, OR we can do a frange.

The question is: How do you force the results to come back with bbox first ?

http://localhost/solr/select?q=*:*&pt=45,-89&d=160&facet.true&facet.query={!bbox}
AND {!frange l=0 u=160}geodist()

That does not work.

http://localhost/solr/select?q=*:*&pt=45,-89&d=160&facet=true&facet.query={!bbox}
AND {!geofilt}

What else can I try? This only seems to apply one of the queries (the
first one):

http://localhost/solr/select?q=*:*&pt=45,-89&d=160&facet=true&facet.query=_query_:"{!bbox}";
AND _query_:"{!geofilt}"

I also qas thinking query() might work?

Can I use cache=true and cost=10,100 to order this? HELP!!



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Resilient clusters / Jira tickets

2012-03-07 Thread Mark Miller
Just go here: https://issues.apache.org/jira/browse/SOLR

On Mar 7, 2012, at 11:57 AM, Ranjan Bagchi wrote:

> Hi --
> 
> Hi, totally appreciate the guidance you've been giving me.  And yes, my use 
> case is having a sharded index where pieces can go in and out of service.
> 
> How do file a jira ticket?  Happy to do it.
> 
> Thanks,
> 
> Ranjan 
> 
> -- 
> Ranjan Bagchi
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 

- Mark Miller
lucidimagination.com













Re: Custom Sharding on solrcloud

2012-03-07 Thread Mark Miller
Hi Phil - 

The default update chain now includes the distributed update processor by 
default - and if in solrcloud mode it will be active.

Probably, what you want to do is define your own update chain (see the wiki). 
Then you can add that update chain as the default for your json update handler 
in solrconfig.xml.

 
 
   
 mychain
   
 

The default chain is: 

  new LogUpdateProcessorFactory(),
  new DistributedUpdateProcessorFactory(),
  new RunUpdateProcessorFactory()

So just use Log and Run instead to get your old behavior.

- Mark

On Mar 7, 2012, at 1:37 PM, Phil Hoy wrote:

> Hi,
> 
> We have a large index and would like to shard by a particular field value, in 
> our case surname. This way we can scale out to multiple machines, yet as most 
> queries filter on surname we can use some application logic to hit just the 
> one core to get the results we need.
> 
> Furthermore as we anticipate the index will grow over time so it make sense 
> (to us) to host a number of shards on a single machine until they get too big 
> at which point we can then move them to another machine.
> 
> We are using solrcloud and it is set up using a solrcore per shard, that way 
> we can direct both queries and updates to the appropriate core/shard. To do 
> this our solr.xml looks a bit like this:
> 
>  zkClientTimeout="1" hostPort="8983" >
>  instanceDir="/data/recordsets/shards/aaa-ava" collection="recordsets" />
>instanceDir="/data/recordsets/shards/aaa-ava" collection="recordsets" />
>instanceDir="/data/recordsets/shards/avb-bel" collection="recordsets" />  
>   ...
> 
> Directed updates via:
> http:/server/solr/aaa-ava/update/json  [{surname:"adams"}]
> 
> Directed queries via:
> http:/server/solr/select?surname:adams&shards=aaa-ava
> 
> This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13  
> before the more recent solrcloud changes but now the update is not directed 
> to the appropriate core. Is there a better way to achieve our needs?
> 
> Phil
> 

- Mark Miller
lucidimagination.com













Re: Apache Lucene Eurocon 2012

2012-03-07 Thread Chris Hostetter

: where and when is the next Eurocon scheduled?
: I read something about denmark and autumn 2012(i don't know where *g*).

I do not know where, but sometime in the fall is probably the correct time 
frame.  I beleive the details will be announced at Lucene Revolution...

http://lucenerevolution.org/

(that's what happened last year)

-Hoss


Suggester configuration question

2012-03-07 Thread Julio Castillo

New to Solr (3.5.0). I have one simple question.

How do I extend the configuration to perform suggestions on more than 
one field?


I'm using the following solrconfig.xml (taken from the online Wiki 
documentation).





suggest
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.tst.TSTLookup
firstName

0.005
true



class="org.apache.solr.handler.component.SearchHandler">


true
suggest
true
5
true


suggest



It all works fine for the field: firstName specified above, but I'm at a 
loss if I want to expand it to cover additional fields (e.g. lastName in 
my schema).


thanks

** julio


Re: Auto-warming for new indexes

2012-03-07 Thread Russell Black
I answered my own question after some digging on the code.  The caches are 
structured as maps. In the cases I looked at, auto-warming ignores the values 
in the map.  Instead, it uses the map's keys (usually a query) to perform and 
cache a search result in the new searcher.  


On Mar 7, 2012, at 2:51 PM, Russell Black wrote:

> As I understand it, auto-warming is the process of populating a new 
> searcher's caches from cached objects in the old searcher's caches.  Let's 
> say that a new searcher is created to service a new index that came from 
> replication.  Because the new searcher is operating on a new index, how is it 
> possible that caches from the old index can be valid for the new index?  
> Records could have been added, removed or changed in the new index.  It seems 
> like this would invalidate the old caches.  What am I missing?
> 
> 
> 



Auto-warming for new indexes

2012-03-07 Thread Russell Black
As I understand it, auto-warming is the process of populating a new searcher's 
caches from cached objects in the old searcher's caches.  Let's say that a new 
searcher is created to service a new index that came from replication.  Because 
the new searcher is operating on a new index, how is it possible that caches 
from the old index can be valid for the new index?  Records could have been 
added, removed or changed in the new index.  It seems like this would 
invalidate the old caches.  What am I missing?





Re: Index all possible facets values even if there is no document in relation

2012-03-07 Thread Chris Hostetter

: I don't have all my facets values used by my documents, but I would like to
: index theses facets values even if they returned 0 documents.

field faceting builds the list of constraints based on the indexed values 
found in the field -- if you don't index it, it doesn't know about it.

if you want to have field facet constraints returned even though they 
aren't in any "real" docs, you need to index at least one "fake" doc 
containing all ofthose values ... then just make sure you exclude that doc 
at query time using an appended "fq" (you could do it by id, or some 
special field, whatever is easiest for you).  

as long as you use "facet.mincount=0" the values will still be retunred as 
constraints even if no documents in your results match them


-Hoss


RE: DIH Delta index takes much time

2012-03-07 Thread Dyer, James
As an insanity check, you might want to take the query that it is executing for 
delta updates and run it manually through a SQL tool, or do an explain plan or 
something.  It almost sounds like there could be a silly error in the query 
you're using and its doing a cartesian join or something like that.

You might also want to try to put your delta data in a text file and use CSV 
Request Handler to try and update the data.  Is it still taking a long time?  
If so, you've eliminated both your database and DIH as the problems, pointing 
to possible resource constraints with your index.  (see 
http://wiki.apache.org/solr/UpdateCSV for detailed instructions how to do this).

If the query runs just fine when run manually, AND if the CSV loader test is 
fast too, then maybe you've stumbled on a new DIH bug nobody has reported 
before?

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ramo Karahasan [mailto:ramo.karaha...@googlemail.com] 
Sent: Wednesday, March 07, 2012 1:55 PM
To: solr-user@lucene.apache.org
Subject: AW: DIH Delta index takes much time

Hi,

thank you fort he help. I've tried: 

dataimport?command=full-import&clean=false&optimize=false

and this takes only 19 minutes the first run with optimihzie=true takes
about 3 hours... the tomcat logs doesn't show any errors

and 19 minutes is to long too, isn't it?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com] 
Gesendet: Mittwoch, 7. März 2012 12:41
An: solr-user@lucene.apache.org
Betreff: Re: DIH Delta index takes much time

> i've indexed my 2 Million documents with DIH on solr. It uses a simple 
> select without joins where it fetches the distinct of title, and 
> furthermore ids, descriptions, urls . the first time I've indexed 
> this, it took about 1 hour. Every 1-2 days I get new entries which I 
> want to index. I'm doing and delta index as described here:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   
> with the command: .dataimport?command=full-import&clean=false
> now I've added 2
> more documents to the database, and run the command again.
> Solr now indexes
> over an hour. The last time I've indexed is two weeks ago, but in this 
> two weeks, nothing has changed.

By default, both full and delta issues an optimize in the end. What happens
if you disable it?

.dataimport?command=full-import&clean=false&optimize=false
.dataimport?command=delta-import&optimize=false




AW: DIH Delta index takes much time

2012-03-07 Thread Ramo Karahasan
Hi,

thank you fort he help. I've tried: 

dataimport?command=full-import&clean=false&optimize=false

and this takes only 19 minutes the first run with optimihzie=true takes
about 3 hours... the tomcat logs doesn't show any errors

and 19 minutes is to long too, isn't it?

Thanks,
Ramo

-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com] 
Gesendet: Mittwoch, 7. März 2012 12:41
An: solr-user@lucene.apache.org
Betreff: Re: DIH Delta index takes much time

> i've indexed my 2 Million documents with DIH on solr. It uses a simple 
> select without joins where it fetches the distinct of title, and 
> furthermore ids, descriptions, urls . the first time I've indexed 
> this, it took about 1 hour. Every 1-2 days I get new entries which I 
> want to index. I'm doing and delta index as described here:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   
> with the command: .dataimport?command=full-import&clean=false
> now I've added 2
> more documents to the database, and run the command again.
> Solr now indexes
> over an hour. The last time I've indexed is two weeks ago, but in this 
> two weeks, nothing has changed.

By default, both full and delta issues an optimize in the end. What happens
if you disable it?

.dataimport?command=full-import&clean=false&optimize=false
.dataimport?command=delta-import&optimize=false




Re: Solr 4.0 and production environments

2012-03-07 Thread Dirceu Vieira
Hey guys,

Great stuff! Thanks a lot for replying.

To be honest, from the beginning I have already felt pretty inclined to
work with the trunk.
Of course, I also have to convince people (at work) that doing so is safe,
and test, and test again..

Thank you very much for your replies, they just made me more confident on
that!

Regards,

Dirceu

On Wed, Mar 7, 2012 at 7:58 PM, Robert Muir  wrote:

> Thanks :)
>
> We often disagree on many low-level details but thanks for the
> confirmation: I felt this was long overdue to express: we take
> releases very seriously but that doesn't mean you should immediately
> discard the possibility of using a snapshot release:
>
> In fact you can even manage your own level of risk:
>
> * If you are looking for a more stable upgrade, consider
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/. This
> is our 'stable branch' and disruptive changes are not backported. In
> general we try to provide the best backwards compat possible, while at
> the same time backporting safe features and bugfixes.
> * For newer more exciting features, take a look at
> http://svn.apache.org/repos/asf/lucene/dev/trunk/: this is our next
> major release and contains the best we have to offer. At the same
> time, reliability is not sacrificed, people work to improve the tests
> every day to make them nastier.
>
> In both cases the major concerns are actually just about the degree of
> apis 'changing' and not actually reliability so much, really our trunk
> is very stable, but you don't have to take my word for it: look at our
> tests.
>
> Of course, its possible during testing you might discover bugs, please
> report them in that case!
>
> Thanks again.
>
> On Wed, Mar 7, 2012 at 1:00 PM, eks dev  wrote:
> > I am here on lucene as a user since the project started, even before
> > solr came to life, many many years. And I was always using trunk
> > version for pretty big customers, and *never* experienced some serious
> > problems. The worst thing that can happen is to notice bug somewhere,
> > and if you have some reasonable testing for your product, you will see
> > it quickly.
> > But, with this community, *you will definitely not have wait long top
> > get it fixed*. Not only they will fix it, they will thank you for
> > bringing it up!
> >
> > I can, as an old user, 100 % vouch what Robert said below.
> >
> > Simply, just go for it, test you application a bit and make your users
> happy.
> >
> >
> >
> >
> > On Wed, Mar 7, 2012 at 5:55 PM, Robert Muir  wrote:
> >> On Wed, Mar 7, 2012 at 11:47 AM, Dirceu Vieira 
> wrote:
> >>> Hi All,
> >>>
> >>> Has anybody started using Solr 4.0 in production environments? Is it
> stable
> >>> enough?
> >>> I'm planning to create a proof of concept using solr 4.0, we have some
> >>> projects that will gain a lot with features such as near real time
> search,
> >>> joins and others, that are available only on version 4.
> >>>
> >>> Is it too risky to think of using it right now?
> >>> What are your thoughts and experiences with that?
> >>>
> >>
> >> In general, we try to keep our 'trunk' (slated to be 4.0) in very
> >> stable condition.
> >>
> >> Really, it should be 'ready-to-release' at any time, of course 4.0 has
> >> had many drastic changes: both at the Lucene and Solr level.
> >>
> >> Before deciding what is stable, you should define stability: is it:
> >> * api stability: will i be able to upgrade to a more recent snapshot
> >> of 4.0 without drastic changes to my app?
> >> * index format stability: will i be able to upgrade to a more recent
> >> snapshot of 4.0 without re-indexing?
> >> * correctness: is 4.0 dangerous in some way that it has many bugs
> >> since much of the code is new?
> >>
> >> I think you should limit your concerns to only the first 2 items, as
> >> far as correctness, just look at the tests. For any open source
> >> project, you can easily judge its quality by its tests: this is a
> >> fact.
> >>
> >> For lucene/solr the testing strategy, in my opinion, goes above and
> >> beyond many other projects: for example random testing:
> >>
> http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011_presentations#dawid_weiss
> >>
> >> and the new solr cloud functionality also adds the similar chaosmonkey
> >> concept on top of this already.
> >>
> >> If you are worried about bugs, is a lucene/solr trunk snapshot less
> >> reliable than even a released version of alternative software? its an
> >> interesting question. look at their tests.
> >>
> >> --
> >> lucidimagination.com
>
>
>
> --
> lucidimagination.com
>



-- 
Dirceu Vieira Júnior
---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr


foo

2012-03-07 Thread Phillip Farber

unsubscribe


RE: How to limit the number of open searchers?

2012-03-07 Thread Michael Ryan
> Unless you have warming happening, there should
> only be a single searcher open at any given time.
> So it seems to me that maxWarmingSearchers
> should give you what you need.

What I'm seeing is that if a query takes a very long time to run, and runs 
across the duration of multiple commits (I know, that itself sounds bad!), I 
can get into a situation where I have 2 searchers in use and 1 searcher 
warming, rather than 1 searcher in use and 1 searcher warming. Due to all the 
memory-intensive features I use, having 3 or more searchers open can cause an 
OutOfMemoryError.

I'm not using master/slave for this application, so can't go that route.

I'd like a way to see how many searchers are currently open that is external to 
Solr. This would allow me to block my commits until I see that there is only 1 
searcher currently open. I could use JMX, but that feels like overkill - 
wondering if there is something simpler.

-Michael


Re: Solr 4.0 and production environments

2012-03-07 Thread Robert Muir
Thanks :)

We often disagree on many low-level details but thanks for the
confirmation: I felt this was long overdue to express: we take
releases very seriously but that doesn't mean you should immediately
discard the possibility of using a snapshot release:

In fact you can even manage your own level of risk:

* If you are looking for a more stable upgrade, consider
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/. This
is our 'stable branch' and disruptive changes are not backported. In
general we try to provide the best backwards compat possible, while at
the same time backporting safe features and bugfixes.
* For newer more exciting features, take a look at
http://svn.apache.org/repos/asf/lucene/dev/trunk/: this is our next
major release and contains the best we have to offer. At the same
time, reliability is not sacrificed, people work to improve the tests
every day to make them nastier.

In both cases the major concerns are actually just about the degree of
apis 'changing' and not actually reliability so much, really our trunk
is very stable, but you don't have to take my word for it: look at our
tests.

Of course, its possible during testing you might discover bugs, please
report them in that case!

Thanks again.

On Wed, Mar 7, 2012 at 1:00 PM, eks dev  wrote:
> I am here on lucene as a user since the project started, even before
> solr came to life, many many years. And I was always using trunk
> version for pretty big customers, and *never* experienced some serious
> problems. The worst thing that can happen is to notice bug somewhere,
> and if you have some reasonable testing for your product, you will see
> it quickly.
> But, with this community, *you will definitely not have wait long top
> get it fixed*. Not only they will fix it, they will thank you for
> bringing it up!
>
> I can, as an old user, 100 % vouch what Robert said below.
>
> Simply, just go for it, test you application a bit and make your users happy.
>
>
>
>
> On Wed, Mar 7, 2012 at 5:55 PM, Robert Muir  wrote:
>> On Wed, Mar 7, 2012 at 11:47 AM, Dirceu Vieira  wrote:
>>> Hi All,
>>>
>>> Has anybody started using Solr 4.0 in production environments? Is it stable
>>> enough?
>>> I'm planning to create a proof of concept using solr 4.0, we have some
>>> projects that will gain a lot with features such as near real time search,
>>> joins and others, that are available only on version 4.
>>>
>>> Is it too risky to think of using it right now?
>>> What are your thoughts and experiences with that?
>>>
>>
>> In general, we try to keep our 'trunk' (slated to be 4.0) in very
>> stable condition.
>>
>> Really, it should be 'ready-to-release' at any time, of course 4.0 has
>> had many drastic changes: both at the Lucene and Solr level.
>>
>> Before deciding what is stable, you should define stability: is it:
>> * api stability: will i be able to upgrade to a more recent snapshot
>> of 4.0 without drastic changes to my app?
>> * index format stability: will i be able to upgrade to a more recent
>> snapshot of 4.0 without re-indexing?
>> * correctness: is 4.0 dangerous in some way that it has many bugs
>> since much of the code is new?
>>
>> I think you should limit your concerns to only the first 2 items, as
>> far as correctness, just look at the tests. For any open source
>> project, you can easily judge its quality by its tests: this is a
>> fact.
>>
>> For lucene/solr the testing strategy, in my opinion, goes above and
>> beyond many other projects: for example random testing:
>> http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011_presentations#dawid_weiss
>>
>> and the new solr cloud functionality also adds the similar chaosmonkey
>> concept on top of this already.
>>
>> If you are worried about bugs, is a lucene/solr trunk snapshot less
>> reliable than even a released version of alternative software? its an
>> interesting question. look at their tests.
>>
>> --
>> lucidimagination.com



-- 
lucidimagination.com


Re: schema design help

2012-03-07 Thread Walter Underwood
Solr is not relational, so you will probably need to take a fresh look at your 
data.

Here is one method.

1. Sketch your search results page.
2. Each result is a document in Solr.
3. Each displayed item is a stored field in Solr.
4. Each searched item is an indexed field in Solr.

It may help to think of this as a big flat materialized view in your DBMS.

wunder
Search Guy, Chegg.com

On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote:

> thanks for replying ..
> 
> In our RDBMS schema we have Establishment/Event/Movie master relations.
> Establishment has title ,description , ratings,  tags, cuisines
> (multivalued), services (multivalued) and features  (multivalued) like
> fields..similarly in Event title, description, category(multivalued)  and
> venue(multivalued) ..fields..and in movies name,start date and end date
> ,genre, theater ,rating , review  like fields ..
> 
>  we are having nearly 1 M data in each entity and movie and event expire
> frequently and we have to update on expire 
> we are having the data additional to index data ( stored data)  to reduce
> RDBMS query..
> 
> please suggest me how to proceed for schema design.. single core or
> multiple core for each entity?
> 
> 
> On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty  wrote:
> 
>> On 6 March 2012 18:01, Abhishek tiwari 
>> wrote:
>>> i am new in solr  want help in shema design .  i have multiple entities
>>> like Event , Establishments and Movies ..each have different types of
>>> relations.. should i make diffrent core for each entities ?
>> 
>> It depends on your use case, i.e., what would your typical searches
>> be on. Normally, using a separate core for each entity would be
>> unusual, and instead one would flatten out typical RDBMS data for
>> Solr.
>> 
>> Please describe what you want to achieve, and people might be
>> better able to help you.
>> 
>> Regards,
>> Gora
>> 







Custom Sharding on solrcloud

2012-03-07 Thread Phil Hoy
Hi,

We have a large index and would like to shard by a particular field value, in 
our case surname. This way we can scale out to multiple machines, yet as most 
queries filter on surname we can use some application logic to hit just the one 
core to get the results we need.

Furthermore as we anticipate the index will grow over time so it make sense (to 
us) to host a number of shards on a single machine until they get too big at 
which point we can then move them to another machine.

We are using solrcloud and it is set up using a solrcore per shard, that way we 
can direct both queries and updates to the appropriate core/shard. To do this 
our solr.xml looks a bit like this:



   
   
...

Directed updates via:
http:/server/solr/aaa-ava/update/json  [{surname:"adams"}]

Directed queries via:
http:/server/solr/select?surname:adams&shards=aaa-ava

This setup used to work in version apache-solr-4.0-2011-12-12_09-14-13  before 
the more recent solrcloud changes but now the update is not directed to the 
appropriate core. Is there a better way to achieve our needs?

Phil



Re: Solr 4.0 and production environments

2012-03-07 Thread eks dev
I am here on lucene as a user since the project started, even before
solr came to life, many many years. And I was always using trunk
version for pretty big customers, and *never* experienced some serious
problems. The worst thing that can happen is to notice bug somewhere,
and if you have some reasonable testing for your product, you will see
it quickly.
But, with this community, *you will definitely not have wait long top
get it fixed*. Not only they will fix it, they will thank you for
bringing it up!

I can, as an old user, 100 % vouch what Robert said below.

Simply, just go for it, test you application a bit and make your users happy.




On Wed, Mar 7, 2012 at 5:55 PM, Robert Muir  wrote:
> On Wed, Mar 7, 2012 at 11:47 AM, Dirceu Vieira  wrote:
>> Hi All,
>>
>> Has anybody started using Solr 4.0 in production environments? Is it stable
>> enough?
>> I'm planning to create a proof of concept using solr 4.0, we have some
>> projects that will gain a lot with features such as near real time search,
>> joins and others, that are available only on version 4.
>>
>> Is it too risky to think of using it right now?
>> What are your thoughts and experiences with that?
>>
>
> In general, we try to keep our 'trunk' (slated to be 4.0) in very
> stable condition.
>
> Really, it should be 'ready-to-release' at any time, of course 4.0 has
> had many drastic changes: both at the Lucene and Solr level.
>
> Before deciding what is stable, you should define stability: is it:
> * api stability: will i be able to upgrade to a more recent snapshot
> of 4.0 without drastic changes to my app?
> * index format stability: will i be able to upgrade to a more recent
> snapshot of 4.0 without re-indexing?
> * correctness: is 4.0 dangerous in some way that it has many bugs
> since much of the code is new?
>
> I think you should limit your concerns to only the first 2 items, as
> far as correctness, just look at the tests. For any open source
> project, you can easily judge its quality by its tests: this is a
> fact.
>
> For lucene/solr the testing strategy, in my opinion, goes above and
> beyond many other projects: for example random testing:
> http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011_presentations#dawid_weiss
>
> and the new solr cloud functionality also adds the similar chaosmonkey
> concept on top of this already.
>
> If you are worried about bugs, is a lucene/solr trunk snapshot less
> reliable than even a released version of alternative software? its an
> interesting question. look at their tests.
>
> --
> lucidimagination.com


RE: Solr 4.0 and production environments

2012-03-07 Thread Darren Govoni

As a rule of thumb, many will say not to go to production with a pre-release baseline. So until 
Solr4 goes "final" and "stable", it's best not to assume too much about it.

Second suggestion is to properly stage new technologies in your product such 
that they go through their own validation. And so to that end, jump right in 
and start using Solr4 and see for yourself! It's a great technology.

--- Original Message ---
On 3/7/2012  11:47 AM Dirceu Vieira wrote:Hi All,

Has anybody started using Solr 4.0 in production environments? Is it stable
enough?
I'm planning to create a proof of concept using solr 4.0, we have some
projects that will gain a lot with features such as near real time search,
joins and others, that are available only on version 4.

Is it too risky to think of using it right now?
What are your thoughts and experiences with that?

Best regards,

-- 
Dirceu Vieira Júnior

---
+47 9753 2473
dirceuvjr.blogspot.com
twitter.com/dirceuvjr



Re: Java6 End of Life, upgrading to 7

2012-03-07 Thread Shawn Heisey

On 2/28/2012 8:16 AM, Shawn Heisey wrote:
Due to the End of Life announcement for Java6, I am going to need to 
upgrade to Java 7 in the very near future.  I'm running Solr 3.5.0 
modified with a couple of JIRA patches.


https://blogs.oracle.com/henrik/entry/updated_java_6_eol_date

I saw the announcement that Java 7u1 had fixed all the known bugs 
relating to Solr.  Is there anything I need to be aware of when 
upgrading?  These are the commandline switches I am using that apply 
to Java itself:


-Xms8192M
-Xmx8192M
-XX:NewSize=6144M
-XX:SurvivorRatio=4
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled


I assume from the lack of response that either there are no foreseeable 
problems or the potential problems are so bad that nobody wants to 
mention them.


I would upgrade to Java 7 on my secondary index servers first.  My SolrJ 
build program is normally colocated on one of the primary index servers, 
though corosync/pacemaker can move it in the event of a machine failure.


Here's a more targeted question:

In simple terms, this means that for a testing period of several weeks, 
my SolrJ application will be running a different Java version than one 
of my indexes, and the same version as the other index.  The Java 7 
announcement on the Solr page says that there are unicode changes when 
upgrading to Java 7.  Will those changes cause problems with SolrJ on 
one java version and the index on another?


Thanks,
Shawn



Resilient clusters / Jira tickets

2012-03-07 Thread Ranjan Bagchi
Hi --

Hi, totally appreciate the guidance you've been giving me.  And yes, my use 
case is having a sharded index where pieces can go in and out of service.

How do file a jira ticket?  Happy to do it.

Thanks,

Ranjan 

-- 
Ranjan Bagchi
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)



Re: Solr 4.0 and production environments

2012-03-07 Thread Robert Muir
On Wed, Mar 7, 2012 at 11:47 AM, Dirceu Vieira  wrote:
> Hi All,
>
> Has anybody started using Solr 4.0 in production environments? Is it stable
> enough?
> I'm planning to create a proof of concept using solr 4.0, we have some
> projects that will gain a lot with features such as near real time search,
> joins and others, that are available only on version 4.
>
> Is it too risky to think of using it right now?
> What are your thoughts and experiences with that?
>

In general, we try to keep our 'trunk' (slated to be 4.0) in very
stable condition.

Really, it should be 'ready-to-release' at any time, of course 4.0 has
had many drastic changes: both at the Lucene and Solr level.

Before deciding what is stable, you should define stability: is it:
* api stability: will i be able to upgrade to a more recent snapshot
of 4.0 without drastic changes to my app?
* index format stability: will i be able to upgrade to a more recent
snapshot of 4.0 without re-indexing?
* correctness: is 4.0 dangerous in some way that it has many bugs
since much of the code is new?

I think you should limit your concerns to only the first 2 items, as
far as correctness, just look at the tests. For any open source
project, you can easily judge its quality by its tests: this is a
fact.

For lucene/solr the testing strategy, in my opinion, goes above and
beyond many other projects: for example random testing:
http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011_presentations#dawid_weiss

and the new solr cloud functionality also adds the similar chaosmonkey
concept on top of this already.

If you are worried about bugs, is a lucene/solr trunk snapshot less
reliable than even a released version of alternative software? its an
interesting question. look at their tests.

-- 
lucidimagination.com


Re: XSLT Response Writer and content transformation

2012-03-07 Thread darul
Finally get success to make it while upgrading transformer to use Saxon, I
will give you details soon, it can be useful and is nice feature to get nice
rss feed.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/XSLT-Response-Writer-and-content-transformation-tp3800251p3807212.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How to stop processing of DataImportHandler in EventListener

2012-03-07 Thread Dyer, James
Wenca,

I have an app with requirements similar to yours.  We have maybe 40 caches that 
need to be built, then when they're done (and if they all succeed), the main 
indexing runs.  For this I wrote some quick-n-squirrley code that executes a 
configurable # of cache-building handlers at a time.  When one finishes, 
another starts until they're all done.  When they all finish, the main indexing 
DIH starts.  I just run this in a separate JVM on the master solr node.  It 
keeps track of which ones are running and then polls the handlers w/ http every 
few seconds to see if they're done (scrapeing that 
"experimental/subject-to-change with typos" page to get the status). 

So this is similar to Mikhail's advice.  Possibly you can script this simply if 
you just have a 1 or a few caches that need to be built.  You might even be 
able to monitor your container's log output to know when the first one finishes 
and the next one starts, if you don't want to scrape the http output (I forget 
if DIHCacheWriter logs anything useful you could use).

My opinion is this is a real missing feature with DIH.  However, I would shy 
away from adding more stuff like this until we can clean up some of DIHs more 
fundamental shortcomings.  (DIH is great for many use cases, but the code has 
suffered neglect and needs a facelift in my opinion)

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Wednesday, March 07, 2012 3:24 AM
To: solr-user@lucene.apache.org
Subject: Re: How to stop processing of DataImportHandler in EventListener

Hello,

It seems you have some app which triggers these DIH requests. Can't you add
a precondition in that app? Before run the second DIH, check status of the
first one whether it RUNNING or IDLE.

Regards

2012/3/7 Wenca 

> Hi,
>
> I have 2 DataImportHandlers configured. The first one prepares data to
> berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then
> indexes documents reading subentity data from the cache.
>
> I need some way to prevent the second handler to run if the first one is
> currently runnig to prevent reading any inconsistent data. I have't found
> any clear way to achieve this yet.
>
> I thought I can use EventListener before the second handler that will
> check whether the cache dataimport is running and if so set some flag, that
> the processing should not continue.
>
> Or is there another way to block data import handler when another one is
> running?
>
> in solrconfig.xml I have:
>
>   class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>
>  db-data-config.**xml
>  ...**
>
> 
>
>   class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>
>cache-db-data-**config.xml
>
>org.apache.solr.handler.**dataimport.DIHCacheWriter
>
>
>org.apache.solr.handler.**dataimport.BerkleyBackedCache
>
>...**
>data_**cache
>id
>
> 
>
> Thank wenca
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


Re: Using multiple DirectSolrSpellcheckers for a query

2012-03-07 Thread Robert Muir
On Wed, Jan 25, 2012 at 12:55 PM, Nalini Kartha  wrote:
>
> Is there any reason why Solr doesn't support using multiple spellcheckers
> for a query? Is it because of performance overhead?
>

Thats not the case really, see https://issues.apache.org/jira/browse/SOLR-2926

I think the issue is that the spellchecker APIs need to be extended to
allow this to happen easier, there is no real hard
performance/technical/algorithmic issue, its just a matter of
refactoring spellchecker APIs to allow this!

-- 
lucidimagination.com


RE: Using multiple DirectSolrSpellcheckers for a query

2012-03-07 Thread Dyer, James
Nalini,

You're at least the second person to mention a need to override "mm" in 
conjunction with "maxCollationTries".  I opened 
https://issues.apache.org/jira/browse/SOLR-3211 to see about getting this 
addressed.  (not sure if it will be done soon though).  The only workaround I 
can think of is to use the "spellcheck.q" parameter and insert "AND" between 
all your keywords.

I'm not sure I can think of an easy solution to your other problem.  The fact 
is, if a user enters "run", how do you know he meant "running" and not "sun" ?  
I mean, if either substitution results in hits, then the user could have meant 
either, right?  (This is a fake example though, because if "run" is in your 
dictionary, the spellchecker will not even try to correct it.  Maybe a better 
example is if the user entered "eun", which could correct to either "run" or 
"sun".).

If you really hate this behavior, maybe you could also solve this using 
"spellcheck.q".  What if you had something like this:

?q=eun jump
&mm=0
&defType=dismax
&qf=docStemmed
&spellcheck=true 
{lotsa spellcheck params here} 
&spellcheck.q=docUnstemmed:(eun AND jump)

...now it won't both correct and stem.  The corrections would need to match the 
raw keyword.  Is this closer to what you want?

One other note here...It looks like your "docUnstemmed" and "spellcheck" fields 
have pretty much the same or similar analysis.  You might not need both of 
them.  Possibly this would be a way to save some index-bloat?

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Tuesday, March 06, 2012 6:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Using multiple DirectSolrSpellcheckers for a query

Hi James,

Thanks for the detailed reply and sorry for the delay getting back.

One issue for us with using the collate functionality is that some of our
query types  are default OR (implemented using the mm param value). Since
the collate functionality reruns the query using all param values specified
in the original query, it'll effectively be issuing an OR query again
right? Which means that again we could end up with corrections which aren't
the best for the current query?

Another issue we're running into is that we're using unstemmed fields as
the source for our spell correction field and so we could end up
unnecessarily correcting queries containing stemmed versions of words.

So for eg. if I have a document containing "running" my fields look like
this -

docUnstemmed: running
docStemmed: run, ...
spellcheck: running

If a user searches for "run OR jump", there are matching results (since we
search against both the stemmed and unstemmed fields) but the spellcheck
results will contain corrections for "run", let's say "sun". We don't want
to overcorrect queries which are returning valid results like this one. Any
suggestions for how to deal with this?

I was thinking that there might be value in having another dictionary which
is used for vetting words but not for finding corrections - the stemmed
fields could be used as a source for this dictionary. So before finding
corrections for a term if it doesn't exist in the primary dictionary, check
the secondary dictionary and make sure the term does not exist in it as
well. But then, this would require an extra copyfield (we could have
multiple unstemmed fields as a source for this secondary dictionary) and
bloat the index even more so I'm not sure if it's feasible.

Thanks,
Nalini

On Thu, Jan 26, 2012 at 10:23 AM, Dyer, James wrote:

> Nalini,
>
> Right now the best you can do is to use  to combine everything
> into a catch-all for spellchecking purposes.  While this seems wasteful,
> this often has to be done anyhow because typically you'll need
> less/different analysis for spellchecking than for searching.  But rather
> than having separate s to create multiple dictionaries, put
> everything into one field to create a single "master" dictionary.
>
> From there, you need to set "spellcheck.collate" to true and also
> "spellcheck.maxCollationTries" greater than zero (5-10 usually works).  The
> first parameter tells it to generate re-written queries with spelling
> suggestions (collations).  The second parameter tells it to weed out any
> collations that won't generate hits if you re-query them.  This is
> important because having unrelated keywords in your master dictionary will
> increase the chances the spellchecker will pick the wrong words as
> corrections.
>
> There is a significant caveat to this:  The spellchecker typically only
> suggests for words in the dictionary.  So by creating a huge, master
> dictionary you might find that many misspelled words won't generate
> suggestions.  See this thread for some workarounds:
> http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-td3658411.html
>
> I think having multiple, per-field dictionaries as you suggest might be a
> good way to go.  While t

solr geospatial / spatial4j

2012-03-07 Thread Matt Mitchell
Hi,

I'm researching options for handling a better geospatial solution. I'm
currently using Solr 3.5 for a read-only "database", and the
point/radius searches work great. But I'd like to start doing point in
polygon searches as well. I've skimmed through some of the geospatial
jira issues, and read about spaitial4j, which is very interesting. I
see on the github page that this will soon be part of lucene, can
anyone confirm this?

I attempted to build the spatial4j demo but no luck. It had problems
finding lucene 4.0-SNAPSHOT, which I guess is because there are no
4.0-SNAPSHOT nightly builds? If anyone knows how I can get around
this, please let me know!

Other than spatial4j, is there a way to do point in polgyon searches
with solr 3.5.0 right now? Is there some tricky indexing/querying
strategy that would allow this?

Thanks!

- Matt


Re: Need some quick help diagnosing query

2012-03-07 Thread Donald Organ
>
> > Would this also be affected if one of the fields that
> > contains that term is
> > a  defined as solr.StrField   where as
> > most of the other fields are defined
> > as solr.TextField?
>
> It could be. string fields are not analyzed. For example, one whitespace
> can prevent match.  Cards and cards wont match too. (lowercase)
>

Ok looks like its time to setup some copyFields.  I will try that and see
if that help and point my query at the solr.TextField's


Re: Need some quick help diagnosing query

2012-03-07 Thread Ahmet Arslan


--- On Wed, 3/7/12, Donald Organ  wrote:

> From: Donald Organ 
> Subject: Re: Need some quick help diagnosing query
> To: solr-user@lucene.apache.org
> Date: Wednesday, March 7, 2012, 4:59 PM
> >
> > Simply your collection does contain a doc having all
> these three terms?
> > Try different mm values.
> >
> >
> > http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
> >
> 
> Would this also be affected if one of the fields that
> contains that term is
> a  defined as solr.StrField   where as
> most of the other fields are defined
> as solr.TextField?

It could be. string fields are not analyzed. For example, one whitespace can 
prevent match.  Cards and cards wont match too. (lowercase)


Re: Need some quick help diagnosing query

2012-03-07 Thread Donald Organ
>
> Simply your collection does contain a doc having all these three terms?
> Try different mm values.
>
>
> http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
>

Would this also be affected if one of the fields that contains that term is
a  defined as solr.StrField   where as most of the other fields are defined
as solr.TextField?


Re: Need some quick help diagnosing query

2012-03-07 Thread Ahmet Arslan


--- On Wed, 3/7/12, Donald Organ  wrote:

> From: Donald Organ 
> Subject: Need some quick help diagnosing query
> To: "solr-user" 
> Date: Wednesday, March 7, 2012, 4:43 PM
> Right now i am doing the following:
> 
>     qf=name^1.75 codeTXT^1.75 cat_search^1.5
> description^0.8 brand^5.0
> cat_search^0.8
>     fl=code,score
>     defType=dismax
>     q=whitney brothers carts
> 
> 
>  if i change it to the following  then i get results:
> 
>     qf=name^1.75 codeTXT^1.75 cat_search^1.5
> description^0.8 brand^5.0
> cat_search^0.8
>     fl=code,score
>     defType=dismax
>     q=whitney brothers
> 
> 
> So why is the first query returning 0 results?

Simply your collection does contain a doc having all these three terms? Try 
different mm values. 

http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29


Need some quick help diagnosing query

2012-03-07 Thread Donald Organ
Right now i am doing the following:

qf=name^1.75 codeTXT^1.75 cat_search^1.5 description^0.8 brand^5.0
cat_search^0.8
fl=code,score
defType=dismax
q=whitney brothers carts


 if i change it to the following  then i get results:

qf=name^1.75 codeTXT^1.75 cat_search^1.5 description^0.8 brand^5.0
cat_search^0.8
fl=code,score
defType=dismax
q=whitney brothers


So why is the first query returning 0 results?


Re: docBoost with "fq" search

2012-03-07 Thread Ahmet Arslan


--- On Wed, 3/7/12, Gian Marco Tagliani  wrote:

> From: Gian Marco Tagliani 
> Subject: docBoost with "fq" search
> To: solr-user@lucene.apache.org
> Date: Wednesday, March 7, 2012, 3:11 PM
> Hi All,
> I'm seeing strange behavior with my Solr (version 3.4).
> 
> For searching I'm using the "q" and the "fq" params.
> At index-time I'm adding a docBoost to each document.
> 
> When I perform a search with both "q" and "fq" params
> everything works.
> For the search with "q=*:*" and something in the "fq", it
> seems to me that the docBoost in not taken into
> consideration.
> 
> Is that possible?

Yes possible.

FilterQuery (fq) does not contribute to score. It is not used in score 
calculation. 

MatchAllDocsQuery (*:*) is a fast way to return all docs. Adding 
&fl=score&debugQuery=on will show that all docs will get constant score of 1.0.


Re: High disk space usage after replication

2012-03-07 Thread Erick Erickson
Well, I'd upgrade to a newer Solr ...

But seriously, first there is an expected temporary spike during
replication, you can expect the  size on disk to occasionally
up to double during replication, that's just how replication is
designed to work...

But if the files are *staying*, then that is, indeed, odd. Do they remain
after you say, bounce the slave? Under *nix op systems, files
remain until all processes are done using them, so it's possible
that you're looking at them before, say, warmups are done.

If you bounce the slave and they disappear, then you might
be able to chalk it up to a one-time issue (hand waving here).

Best
Erick

On Wed, Mar 7, 2012 at 6:20 AM, mechravi25  wrote:
> Hi,
>
> I'm using one master and slave in SOLR. When I try to replicate from master
> to slave, the data is getting replicated properly and the changes are
> getting implemented rightly in the SOLR UI. But, the indexing size is
> doubled in the slave when compared to the master. (i.e.) for eg:
> If,
> Master Indexed Data Size is 270MB,
> The Slave's Indexed Data Size is 600MB
>
> It seems like its retaining the old data. we had referred to the following
> links regarding this,
> http://wiki.apache.org/solr/CollectionDistribution#snapcleaner
> http://markmail.org/thread/yw5n4dk2t5zbt5z5#query:+page:1+mid:43cqxnjkecfnotiz+state:results
>
> Previously, It was working fine; but, this kind of issue has started to
> appear only now. The Version Specifications that we are using is as follows
>
> SOLR Master
>
> Solr Specification Version: 1.4.0.2010.01.13.08.09.44
> Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44
> Lucene Specification Version: 2.9.1-dev
> Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31
>
> SOLR Slave
>
> Solr Specification Version: 1.4.0.2010.01.13.08.09.44
> Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44
> Lucene Specification Version: 2.9.1-dev
> Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31
>
> Can someone please tell me where I'm going wrong and guide me on this?
>
> Thanks.
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/High-disk-space-usage-after-replication-tp3806417p3806417.html
> Sent from the Solr - User mailing list archive at Nabble.com.


docBoost with "fq" search

2012-03-07 Thread Gian Marco Tagliani

Hi All,
I'm seeing strange behavior with my Solr (version 3.4).

For searching I'm using the "q" and the "fq" params.
At index-time I'm adding a docBoost to each document.

When I perform a search with both "q" and "fq" params everything works.
For the search with "q=*:*" and something in the "fq", it seems to me 
that the docBoost in not taken into consideration.


Is that possible?

Thanks


Re: solr out of memory

2012-03-07 Thread Erick Erickson
MaxPermSize probably isn't what you want, try -Xmx1G or similar.

If that doesn't work, you need to post a lot more information about
your setup, it might help to review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

2012/3/7 C.Yunqin <345804...@qq.com>:
> Daniel,
> thanks very much:)
>
>
> however,  i have allocated enough memory to java as follows:
> java -XX:+AggressiveHeap -XX:MaxPermSize=1024m -jar start.jar
>
>
> any other reason for that Outofmemory error ?
>
>
> ps: it is strange, though my search like "id: chenm" caused "SEVERE: 
> java.lang.OutOfMemoryError: Java heap space" ,  the solr still works ,and i 
> can continue searching other words.
>
>
>
> - 原始邮件 --
> 发件人: "Daniel Brügge";
> 发送时间: 2012年3月6日(星期二) 晚上6:35
> 收件人: "solr-user";
>
> 主题: Re: solr out of memory
>
>
> Maybe the index is to big and you need to add more memory to the JVM via
> the -Xmx parameter. See also
> http://wiki.apache.org/solr/SolrPerformanceFactors#OutOfMemoryErrors
>
> Daniel
>
> On Tue, Mar 6, 2012 at 10:01 AM, C.Yunqin <345804...@qq.com> wrote:
>
>> sometimes when i search  a simple  word ,like "id: chenm"
>> the solr report eror:
>> SEVERE: java.lang.OutOfMemoryError: Java heap space
>>
>>
>> i do not know why?
>> sometime the query goes on well.
>> anyone have an ideal of that?
>>
>>
>> thanks a lot


Re: How to Index Custom XML structure

2012-03-07 Thread Erick Erickson
Well, I'm ManifoldCF ignorant, so I'll have to defer on this one

Best
Erick

On Tue, Mar 6, 2012 at 12:24 PM, Anupam Bhattacharya
 wrote:
> Thanks Erick, for the prompt response,
>
> Both the suggestions will be useful for a one time indexing activity. Since
> DIH will be one time process of indexing the repository thus it is of no
> use in my case.Writing a standalone Java program utilizing SolrJ will again
> be a one time indexing process.
>
> I want to write a separate Handler which will be called by ManifoldCF Job
> to create indexes in SOLR. In my case the repository is Documentum Content
> server. I found some relevant link at this url..
> https://community.emc.com/docs/DOC-6520 which is quite similar to my
> requirement.
>
> I modified the code to parse the XML and added that into the document
> properties Although this works fine when i try to test it with my CURL
> program with parameters but when the same handler is called from ManifoldCF
> job the job gets terminated within few minutes. Not sure the reason for
> that. The handler is written similar to /update/extract which is
> ExtractingRequestHandler.
>
> Is ExtractingRequestHandler capable of extracting tag name and values using
> some of its defined attributes like capture, captureAttr, extractOnly etc ?
> which can be added into the document indexes..
>
>
> On Tue, Feb 28, 2012 at 8:26 AM, Erick Erickson 
> wrote:
>
>> You might be able to do something with the XSL Transformer step in DIH.
>>
>> It might also be easier to just write a SolrJ program to parse the XML and
>> construct a SolrInputDocument to send to Solr. It's really pretty
>> straightforward.
>>
>> Best
>> Erick
>>
>> On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya
>>  wrote:
>> > Hi,
>> >
>> > I am using ManifoldCF to Crawl data from Documentum repository. I am able
>> > to successfully read the metadata/properties for the defined document
>> types
>> > in Documentum using the out-of-the box Documentum Connector in
>> ManifoldCF.
>> > Unfortunately, there is one XML file also present which consists of a
>> > custom XML structure which I need to read and fetch the element values
>> and
>> > add it for indexing in lucene through SOLR.
>> >
>> > Is there any mechanism to index any XML structure document in SOLR ?
>> >
>> > I checked the SOLR CELL framework which support below stucture..
>> >
>> > 
>> >  
>> >    9885A004
>> >    Canon PowerShot SD500
>> >    camera
>> >    3x optical zoom
>> >    aluminum case
>> >    6.4
>> >    329.95
>> >  
>> >  
>> >    9885A003
>> >    Canon PowerShot SD504
>> >    camera1
>> >    3x optical zoom1
>> >    aluminum case1
>> >    6.41
>> >    329.956
>> >  
>> > 
>> >
>> > & my Custom XML structure is of the following format.. from which I need
>> to
>> > read *subject *& *abstract *field for indexing. I checked TIKA project
>> but
>> > I couldn't find any useful stuff.
>> >
>> > 
>> > 
>> > 1
>> > This is an abstract.
>> > Text Subject
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> >
>> > Appreciate any help on this.
>> >
>> > Regards
>> > Anupam
>>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya


Re: How to limit the number of open searchers?

2012-03-07 Thread Erick Erickson
Unless you have warming happening, there should
only be a single searcher open at any given time.
So it seems to me that maxWarmingSearchers
should give you what you need.

And you can pretty easily insure this by making your
poll interval (assuming master/slave) longer
than your warmup time.

Best
Erick

On Mon, Mar 5, 2012 at 2:18 PM, Michael Ryan  wrote:
> Is there a way to limit the number of searchers that can be open at a given 
> time?  I know there is a maxWarmingSearchers configuration that limits the 
> number of warming searchers, but that's not quite what I'm looking for...
>
> Ideally, when I commit, I want there to only be one searcher open before the 
> commit, so that during the commit and warming, there is a max of two 
> searchers open.  I'd be okay with delaying the commit until there is only one 
> searcher open.  Is there a way to programmatically determine how many 
> searchers are currently open?
>
> -Michael


Re: '500' : Internal server error

2012-03-07 Thread Gora Mohanty
On 7 March 2012 17:59, aditya jatnika martin  wrote:
> Dear Developer,
>
> I have a problem with solr, every time I add document the result message
> always "'500' Status : Internal server error",
[...]

Have you looked in the Solr logs for further details on the
exception?

Regards,
Gora


How to index doc file in solr?

2012-03-07 Thread Rohan Ashok Kumbhar
Hi,

I would like to know how to index any  document other than xml in SOLR ?
Any comments would be appreciated !!!


Thanks,
Rohan


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: schema design help

2012-03-07 Thread Willem Basson
I would say do both. If you have the capacity create a core for each and
one that combines it and do some tests.
There are pros and cons to both approaches. If you ever need joins in RDBMS
terms then you probably want one index.
If not then one index might still be a lot easier.
The only real reason to start splitting is if you hit resource constraints,
memory etc.
And then multiple cores won't necessarily solve the problems either.
Build one big index and see how it performs.


On Wed, Mar 7, 2012 at 6:56 AM, Abhishek tiwari <
abhishek.tiwari@gmail.com> wrote:

> thanks for replying ..
>
> In our RDBMS schema we have Establishment/Event/Movie master relations.
> Establishment has title ,description , ratings,  tags, cuisines
> (multivalued), services (multivalued) and features  (multivalued) like
> fields..similarly in Event title, description, category(multivalued)  and
> venue(multivalued) ..fields..and in movies name,start date and end date
> ,genre, theater ,rating , review  like fields ..
>
>  we are having nearly 1 M data in each entity and movie and event expire
> frequently and we have to update on expire 
> we are having the data additional to index data ( stored data)  to reduce
> RDBMS query..
>
> please suggest me how to proceed for schema design.. single core or
> multiple core for each entity?
>
>
> On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty  wrote:
>
> > On 6 March 2012 18:01, Abhishek tiwari 
> > wrote:
> > > i am new in solr  want help in shema design .  i have multiple entities
> > > like Event , Establishments and Movies ..each have different types of
> > > relations.. should i make diffrent core for each entities ?
> >
> > It depends on your use case, i.e., what would your typical searches
> > be on. Normally, using a separate core for each entity would be
> > unusual, and instead one would flatten out typical RDBMS data for
> > Solr.
> >
> > Please describe what you want to achieve, and people might be
> > better able to help you.
> >
> > Regards,
> > Gora
> >
>



-- 
Willem Basson


Re: DIH Delta index takes much time

2012-03-07 Thread Ahmet Arslan
> i've indexed my 2 Million documents with DIH on solr. It
> uses a simple
> select without joins where it fetches the distinct of title,
> and furthermore
> ids, descriptions, urls . the first time I've indexed this,
> it took about 1
> hour. Every 1-2 days I get new entries which I want to
> index. I'm doing and
> delta index as described here:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   with
> the command: .dataimport?command=full-import&clean=false
> now I've added 2
> more documents to the database, and run the command again.
> Solr now indexes
> over an hour. The last time I've indexed is two weeks ago,
> but in this two
> weeks, nothing has changed.

By default, both full and delta issues an optimize in the end. What happens if 
you disable it?

.dataimport?command=full-import&clean=false&optimize=false
.dataimport?command=delta-import&optimize=false



Index all possible facets values even if there is no document in relation

2012-03-07 Thread Xavier
Hi everyone,

My question is a little weird but i need to have all my facet values in solr
index :

I have a database with all possible values of my facets for my solr
documents.

I don't have all my facets values used by my documents, but I would like to
index theses facets values even if they returned 0 documents.

I need this for SEO management, and because i want to test this facets
values (with 0 documents) without requesting my database.


Best Regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-all-possible-facets-values-even-if-there-is-no-document-in-relation-tp3806461p3806461.html
Sent from the Solr - User mailing list archive at Nabble.com.


High disk space usage after replication

2012-03-07 Thread mechravi25
Hi,
 
I'm using one master and slave in SOLR. When I try to replicate from master
to slave, the data is getting replicated properly and the changes are
getting implemented rightly in the SOLR UI. But, the indexing size is
doubled in the slave when compared to the master. (i.e.) for eg:
If,
Master Indexed Data Size is 270MB,
The Slave's Indexed Data Size is 600MB
 
It seems like its retaining the old data. we had referred to the following
links regarding this,
http://wiki.apache.org/solr/CollectionDistribution#snapcleaner
http://markmail.org/thread/yw5n4dk2t5zbt5z5#query:+page:1+mid:43cqxnjkecfnotiz+state:results
 
Previously, It was working fine; but, this kind of issue has started to
appear only now. The Version Specifications that we are using is as follows
 
SOLR Master

Solr Specification Version: 1.4.0.2010.01.13.08.09.44 
Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44 
Lucene Specification Version: 2.9.1-dev 
Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31 
 
SOLR Slave

Solr Specification Version: 1.4.0.2010.01.13.08.09.44 
Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44 
Lucene Specification Version: 2.9.1-dev 
Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31 

Can someone please tell me where I'm going wrong and guide me on this?

Thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-disk-space-usage-after-replication-tp3806417p3806417.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH Delta index takes much time

2012-03-07 Thread Ramo Karahasan
Hi,

 

i've indexed my 2 Million documents with DIH on solr. It uses a simple
select without joins where it fetches the distinct of title, and furthermore
ids, descriptions, urls . the first time I've indexed this, it took about 1
hour. Every 1-2 days I get new entries which I want to index. I'm doing and
delta index as described here:
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport   with
the command: .dataimport?command=full-import&clean=false now I've added 2
more documents to the database, and run the command again. Solr now indexes
over an hour. The last time I've indexed is two weeks ago, but in this two
weeks, nothing has changed.

 

Any ideas how I can fasten that up?


Thanks,

Ramo



Re: How to stop processing of DataImportHandler in EventListener

2012-03-07 Thread Mikhail Khludnev
Hello,

It seems you have some app which triggers these DIH requests. Can't you add
a precondition in that app? Before run the second DIH, check status of the
first one whether it RUNNING or IDLE.

Regards

2012/3/7 Wenca 

> Hi,
>
> I have 2 DataImportHandlers configured. The first one prepares data to
> berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then
> indexes documents reading subentity data from the cache.
>
> I need some way to prevent the second handler to run if the first one is
> currently runnig to prevent reading any inconsistent data. I have't found
> any clear way to achieve this yet.
>
> I thought I can use EventListener before the second handler that will
> check whether the cache dataimport is running and if so set some flag, that
> the processing should not continue.
>
> Or is there another way to block data import handler when another one is
> running?
>
> in solrconfig.xml I have:
>
>   class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>
>  db-data-config.**xml
>  ...**
>
> 
>
>   class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>
>cache-db-data-**config.xml
>
>org.apache.solr.handler.**dataimport.DIHCacheWriter
>
>
>org.apache.solr.handler.**dataimport.BerkleyBackedCache
>
>...**
>data_**cache
>id
>
> 
>
> Thank wenca
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics


 


How to stop processing of DataImportHandler in EventListener

2012-03-07 Thread Wenca

Hi,

I have 2 DataImportHandlers configured. The first one prepares data to 
berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then 
indexes documents reading subentity data from the cache.


I need some way to prevent the second handler to run if the first one is 
currently runnig to prevent reading any inconsistent data. I have't 
found any clear way to achieve this yet.


I thought I can use EventListener before the second handler that will 
check whether the cache dataimport is running and if so set some flag, 
that the processing should not continue.


Or is there another way to block data import handler when another one is 
running?


in solrconfig.xml I have:



  db-data-config.xml
  ...





cache-db-data-config.xml

org.apache.solr.handler.dataimport.DIHCacheWriter


org.apache.solr.handler.dataimport.BerkleyBackedCache

...
data_cache
id



Thank wenca