Re:solr out of memory

2012-03-06 Thread C.Yunqin
Daniel,
thanks very much:)


however,  i have allocated enough memory to java as follows:
java -XX:+AggressiveHeap -XX:MaxPermSize=1024m -jar start.jar


any other reason for that Outofmemory error ?


ps: it is strange, though my search like "id: chenm" caused "SEVERE: 
java.lang.OutOfMemoryError: Java heap space" ,  the solr still works ,and i can 
continue searching other words.



- 原始邮件 --
发件人: "Daniel Brügge";
发送时间: 2012年3月6日(星期二) 晚上6:35
收件人: "solr-user";

主题: Re: solr out of memory

 
Maybe the index is to big and you need to add more memory to the JVM via
the -Xmx parameter. See also
http://wiki.apache.org/solr/SolrPerformanceFactors#OutOfMemoryErrors

Daniel

On Tue, Mar 6, 2012 at 10:01 AM, C.Yunqin <345804...@qq.com> wrote:

> sometimes when i search  a simple  word ,like "id: chenm"
> the solr report eror:
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>
>
> i do not know why?
> sometime the query goes on well.
> anyone have an ideal of that?
>
>
> thanks a lot

Re: schema design help

2012-03-06 Thread Abhishek tiwari
thanks for replying ..

In our RDBMS schema we have Establishment/Event/Movie master relations.
Establishment has title ,description , ratings,  tags, cuisines
(multivalued), services (multivalued) and features  (multivalued) like
fields..similarly in Event title, description, category(multivalued)  and
venue(multivalued) ..fields..and in movies name,start date and end date
,genre, theater ,rating , review  like fields ..

  we are having nearly 1 M data in each entity and movie and event expire
frequently and we have to update on expire 
we are having the data additional to index data ( stored data)  to reduce
RDBMS query..

please suggest me how to proceed for schema design.. single core or
multiple core for each entity?


On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty  wrote:

> On 6 March 2012 18:01, Abhishek tiwari 
> wrote:
> > i am new in solr  want help in shema design .  i have multiple entities
> > like Event , Establishments and Movies ..each have different types of
> > relations.. should i make diffrent core for each entities ?
>
> It depends on your use case, i.e., what would your typical searches
> be on. Normally, using a separate core for each entity would be
> unusual, and instead one would flatten out typical RDBMS data for
> Solr.
>
> Please describe what you want to achieve, and people might be
> better able to help you.
>
> Regards,
> Gora
>


Re: A sorting question.

2012-03-06 Thread Chris Hostetter

: problem becomes from MoreLikeThis behaviour. As you probably know that Solr
: feature only suggests similar components by the first - and only - document
: returned from the original query. That is if you have a query that returns
: 5 documents (a query with five IDs with OR boolean clauses, like before)
: MoreLikeThis only returns similar documents for the first one.

that's how the MLT *Handler* works, but if you use the MLT *component* it 
will give you N docs similar to *each* doc in the response...

  http://wiki.apache.org/solr/MoreLikeThis

So using the 3.5 example configs/data...

http://localhost:8983/solr/select?q=memory&mlt=true&mlt.count=2&rows=5&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1&fl=id,score

...that gives me the fist 5 results matching my query, and for each of 
those 5 results, i get the top 2 results "like" each of the individual 
documents from my main result).

: etc. So now I have to merge the results but, hey! Imagine that you receive
: a sort by Date. You have to compose the final response with the merged
: similar documents and sort it by Date. Thats a problem, right? So I do the
: following:

So it sounds like you want a "more like these" type search ... if query Q 
matchines some set of docs, you want to take the first N docs, and then 
generate a list of M docs similar to those N as a whole?

If i'm understanding correctly, then you might find it easier/better to 
tweak your alogorithm so that you continue to use the MLT *handler* for 
each of your N main docs but instead of looking at the MLT response docs, 
you use mlt.interestingTerms=true and gather up all of the interesting 
terms, then issue one single query where you search for all of those 
interesting terms and see what final set of documents you get back.

As for your original problem (assuming neither of those previous 
comments ar helpful)

: The number of documents is not important. Imagine that you have a rows=20,
: so N=20 and you have and array of 20 similar components ordered correctly
: from most important to less important. Returning to the sorting problem, if
: you launch another and final query to Solr with q=(all the similar document
: IDs ordered) you can append the original sorting by Date, so the results
: can be sorted by Date, or by other field, or just without order... and
: that´s the problem!

...the easiest way i can think of to deal with this is to ignore the sort 
completely.  you're asking for all the docs you want by id and setting 
rows big enough to get them all at once so you know they will all be 
returned on page one, and you know thye have a uniqueKey field (you are 
quering on it) so just make sure "id" in in your "fl" param when you get 
them all back, look at the "id" field and order them they way you want in 
your client code.



-Hoss

Re: performance between ExternalFileField and Join

2012-03-06 Thread Chris Hostetter

: unique terms) but I agree with Erik on the ExternalFileField as you can use
: it just inside a function query, for example, for boosting.

with {!frange} it would be trivial to filter based on values in an 
ExternalFileField ... whether that would be *faster* then a custom
plugin that worked similar to ExternalFileField but only provided boolean 
logic for set membership would require some testing.

: > > I was also intrigued by the Join feature in 4.0 trunk (SOLR-2272). In
: > this
: > > case, I would keep my access data in a separate core, and do cross-core
: > > join queries. The two cores would have about the same number of documents

watch out with this approach, cross-core joins have remained undocumented 
because they are fairly broken...

https://issues.apache.org/jira/browse/SOLR-2824



-Hoss


Re: Building a resilient cluster

2012-03-06 Thread Mark Miller

On Mar 5, 2012, at 11:49 PM, Ranjan Bagchi wrote:

> it didn't kick the second shard out of the cluster.
> 
> Any way to do this?

If you unload a core rather than just shut down the instance, that core will 
remove it's info from zookeeper.

Currently, that won't make it forget about a logical shard I think though (just 
the physical shard). I think you have to then manually edit the zookeeper 
layout and remove the logical shard from under the /collections node.

Feel free to file a JIRA issue around improving this if your use case requires 
it.

- Mark Miller
lucidimagination.com













Re: Building a resilient cluster

2012-03-06 Thread Mark Miller

On Mar 6, 2012, at 7:50 AM, Darren Govoni wrote:

> What I think was mentioned on this a bit ago is that the index stops
> working if one of the "nodes" goes down unless its a replica.
> 
> You have 2 "nodes" running with numShards=2? Thus if one goes down the
> entire index is inoperable. In the future I'm hoping this changes such
> that the index cluster continues to operate but will lack results from
> the downed node. Maybe this has changed in recent trunk updates though.
> Not sure.

No, no support yet for partial results at this level. It's pretty easy to add, 
we just have not gotten to it.

Currently, at least one node must be serving each shard to get results. I don't 
think there is a JIRA for this yet (but there might be).


> 
> On Mon, 2012-03-05 at 20:49 -0800, Ranjan Bagchi wrote:
>> Hi Mark,
>> 
>> So I tried this: started up one instance w/ zookeeper, and started a second
>> instance defining a shard name in solr.xml -- it worked, searching would
>> search both indices, and looking at the zookeeper ui, I'd see the second
>> shard.  However, when I brought the second server down -- the first one
>> stopped working:  it didn't kick the second shard out of the cluster.
>> 
>> Any way to do this?
>> 
>> Thanks,
>> 
>> Ranjan
>> 
>> 
>>> From: Mark Miller 
>>> To: solr-user@lucene.apache.org
>>> Cc:
>>> Date: Wed, 29 Feb 2012 22:57:26 -0500
>>> Subject: Re: Building a resilient cluster
>>> Doh! Sorry - this was broken - I need to fix the doc or add it back.
>>> 
>>> The shard id is actually set in solr.xml since its per core - the sys prop
>>> was a sugar option we had setup. So either add 'shard' to the core in
>>> solr.xml, or to make it work like it does in the doc, do:
>>> 
>>> 
>>> 
>>> That sets shard to the 'shard' system property if its set, or as a default,
>>> act as if it wasn't set.
>>> 
>>> I've been working with custom shard ids mainly through solrj, so I hadn't
>>> noticed this.
>>> 
>>> - Mark
>>> 
>>> On Wed, Feb 29, 2012 at 10:36 AM, Ranjan Bagchi >>> wrote:
>>> 
 Hi,
 
 At this point I'm ok with one zk instance being a point of failure, I
>>> just
 want to create sharded solr instances, bring them into the cluster, and
>>> be
 able to shut them down without bringing down the whole cluster.
 
 According to the wiki page, I should be able to bring up new shard by
>>> using
 shardId [-D shardId], but when I did that, the logs showed it replicating
 an existing shard.
 
 Ranjan
 Andre Bois-Crettez wrote:
 
> You have to run ZK on a at least 3 different machines for fault
> tolerance (a ZK ensemble).
> 
> 
 
>>> http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_sha=
> rd_replicas_and_zookeeper_ensemble
> 
> Ranjan Bagchi wrote:
>> Hi,
>> 
>> I'm interested in setting up a solr cluster where each machine [at
 least
>> initially] hosts a separate shard of a big index [too big to sit on
>>> the
>> machine].  I'm able to put a cloud together by telling it that I have
 (to
>> start out with) 4 nodes, and then starting up nodes on 3 machines
> pointin=
> g
>> at the zkInstance.  I'm able to load my sharded data onto each
>>> machine
>> individually and it seems to work.
>> 
>> My concern is that it's not fault tolerant:  if one of the
 non-zookeeper
>> machines falls over, the whole cluster won't work.  Also, I can't
 create
> =
> a
>> shard with more data, and have it work within the existing cloud.
>> 
>> I tried using -DshardId=3Dshard5 [on an existing 4-shard cluster],
>>> but
 it
>> just started replicating, which doesn't seem right.
>> 
>> Are there ways around this?
>> 
>> Thanks,
>> Ranjan Bagchi
>> 
>> 
 
>>> 
>>> 
>>> 
>>> --
>>> - Mark
>>> 
>>> http://www.lucidimagination.com
>>> 
>>> 
> 
> 

- Mark Miller
lucidimagination.com













Re: about solrj giving facet=true instead of facet=on

2012-03-06 Thread Yuhan Zhang
nvm, they are identical. The problem is that I'm missing rows=0

query string that counts identical fiels are:
solr/select/?q=*%3A*&rows=0&facet=true&facet.field=domain

On Tue, Mar 6, 2012 at 4:26 PM, Yuhan Zhang  wrote:

> hi all,
>
> I'm using the solrj Java client to query solr server with facet. However,
> it ends up that it is giving the wrong query string,
> as I'm expecting:
> /solr/select/?q=*%3A*&facet=on&facet.field=domain
> it gave
> /solr/select/?q=*%3A*&facet=true&facet.field=domain
>
> it set facet=true instead of facet=on.
>
> I'm using solr 3.1 on the server, and also solrj 3.1 on the client.
>
> Is there some setting to change the query string, either from
> the client or on the server? or do I have to checkout the solrj code
> and recompile it?
>
> Thank you.
>
> Yuhan
>



-- 
Yuhan Zhang
Application Developer
OneScreen Inc.
yzh...@onescreen.com 
www.onescreen.com

The information contained in this e-mail is for the exclusive use of the
intended recipient(s) and may be confidential, proprietary, and/or legally
privileged. Inadvertent disclosure of this message does not constitute a
waiver of any privilege.  If you receive this message in error, please do
not directly or indirectly print, copy, retransmit, disseminate, or
otherwise use the information. In addition, please delete this e-mail and
all copies and notify the sender.


Re: Using multiple DirectSolrSpellcheckers for a query

2012-03-06 Thread Nalini Kartha
Hi James,

Thanks for the detailed reply and sorry for the delay getting back.

One issue for us with using the collate functionality is that some of our
query types  are default OR (implemented using the mm param value). Since
the collate functionality reruns the query using all param values specified
in the original query, it'll effectively be issuing an OR query again
right? Which means that again we could end up with corrections which aren't
the best for the current query?

Another issue we're running into is that we're using unstemmed fields as
the source for our spell correction field and so we could end up
unnecessarily correcting queries containing stemmed versions of words.

So for eg. if I have a document containing "running" my fields look like
this -

docUnstemmed: running
docStemmed: run, ...
spellcheck: running

If a user searches for "run OR jump", there are matching results (since we
search against both the stemmed and unstemmed fields) but the spellcheck
results will contain corrections for "run", let's say "sun". We don't want
to overcorrect queries which are returning valid results like this one. Any
suggestions for how to deal with this?

I was thinking that there might be value in having another dictionary which
is used for vetting words but not for finding corrections - the stemmed
fields could be used as a source for this dictionary. So before finding
corrections for a term if it doesn't exist in the primary dictionary, check
the secondary dictionary and make sure the term does not exist in it as
well. But then, this would require an extra copyfield (we could have
multiple unstemmed fields as a source for this secondary dictionary) and
bloat the index even more so I'm not sure if it's feasible.

Thanks,
Nalini

On Thu, Jan 26, 2012 at 10:23 AM, Dyer, James wrote:

> Nalini,
>
> Right now the best you can do is to use  to combine everything
> into a catch-all for spellchecking purposes.  While this seems wasteful,
> this often has to be done anyhow because typically you'll need
> less/different analysis for spellchecking than for searching.  But rather
> than having separate s to create multiple dictionaries, put
> everything into one field to create a single "master" dictionary.
>
> From there, you need to set "spellcheck.collate" to true and also
> "spellcheck.maxCollationTries" greater than zero (5-10 usually works).  The
> first parameter tells it to generate re-written queries with spelling
> suggestions (collations).  The second parameter tells it to weed out any
> collations that won't generate hits if you re-query them.  This is
> important because having unrelated keywords in your master dictionary will
> increase the chances the spellchecker will pick the wrong words as
> corrections.
>
> There is a significant caveat to this:  The spellchecker typically only
> suggests for words in the dictionary.  So by creating a huge, master
> dictionary you might find that many misspelled words won't generate
> suggestions.  See this thread for some workarounds:
> http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-td3658411.html
>
> I think having multiple, per-field dictionaries as you suggest might be a
> good way to go.  While this is not supported, I don't think its because of
> performance concerns.  (There would be an overhead cost to this but I think
> it would still be practical).  It just hasn't been implemented yet.  But we
> might be getting to a possible start to this type of functionality.  In
> https://issues.apache.org/jira/browse/SOLR-2585 a separate spellchecker
> is added that just corrects wordbreak (or is it "word break"?) problems,
> then a "ConjunctionSolrSpellChecker" combines the results from the main
> spellchecker and the wordbreak spellcheker.  I could see a next step beyond
> this being to support per-field dictionaries, checking them separately,
> then combining the results.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> Sent: Wednesday, January 25, 2012 11:56 AM
> To: solr-user@lucene.apache.org
> Subject: Using multiple DirectSolrSpellcheckers for a query
>
> Hi,
>
> We are trying to use the DirectSolrSpellChecker to get corrections for
> mis-spelled query terms directly from fields in the Solr index.
>
> However, we need to use multiple fields for spellchecking a query. It looks
> looks like you can only use one spellchecker for a request and so the
> workaround for this it to create a copy field from the fields required for
> spell correction?
>
> We'd like to avoid this because we allow users to perform different kinds
> of queries on different sets of fields and so to provide meaningful
> corrections we'd have to create multiple copy fields - one for each query
> type.
>
> Is there any reason why Solr doesn't support using multiple spellcheckers
> for a query? Is it because of performance overhead?
>
> Thanks,
> Nalin

about solrj giving facet=true instead of facet=on

2012-03-06 Thread Yuhan Zhang
hi all,

I'm using the solrj Java client to query solr server with facet. However,
it ends up that it is giving the wrong query string,
as I'm expecting:
/solr/select/?q=*%3A*&facet=on&facet.field=domain
it gave
/solr/select/?q=*%3A*&facet=true&facet.field=domain

it set facet=true instead of facet=on.

I'm using solr 3.1 on the server, and also solrj 3.1 on the client.

Is there some setting to change the query string, either from
the client or on the server? or do I have to checkout the solrj code
and recompile it?

Thank you.

Yuhan


Re: Need tokenization that finds part of stringvalue

2012-03-06 Thread PeterKerk
edismax did the trick! Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-tokenization-that-finds-part-of-stringvalue-tp3785366p3805045.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need tokenization that finds part of stringvalue

2012-03-06 Thread Ahmet Arslan
> @iorixxx: Sorry it took so long, had
> some difficulties upgrading to 3.5.0
> 
> It still doesnt work. Here's what I have now:
> 
> I copied text_general_rev from
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/schema.xml
> to my schema.xml:
>      class="solr.TextField"
> positionIncrementGap="100">
>       
>          class="solr.StandardTokenizerFactory"/>
>          class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>          class="solr.LowerCaseFilterFactory"/>
>          class="solr.ReversedWildcardFilterFactory"
> withOriginal="true"
>        
>    maxPosAsterisk="3" maxPosQuestion="2"
> maxFractionAsterisk="0.33"/>
>       
>       
>          class="solr.StandardTokenizerFactory"/>
>          class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>          class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>          class="solr.LowerCaseFilterFactory"/>
>       
>     
> 
> To be complete: this the definition of the title
> fieldtype:    
>      class="solr.TextField"
> positionIncrementGap="100">
>       
>          class="solr.WhitespaceTokenizerFactory"/>
>       
>     
>     
>     
> 
>  stored="true"/>    
>  indexed="true"
> stored="true"/>
>  dest="title_search"/>    
> 
> 
> title field value="Smartphone"
> 
> With this searchquery I dont get any results:
> http://localhost:8983/solr/zz/select/?indent=on&facet=true&q=*smart*&defType=dismax&qf=title_search^20.0&start=0&rows=30&fl=id,title&facet.mincount=1
> 
> What more can I do?
> Thanks!

Dismax query parser does not support wildcard queries. defType=edismax would 
work. Also defType=lucene&df=title_search&q=*smart* should work too.




Re: Need tokenization that finds part of stringvalue

2012-03-06 Thread PeterKerk
@iorixxx: Sorry it took so long, had some difficulties upgrading to 3.5.0

It still doesnt work. Here's what I have now:

I copied text_general_rev from
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/schema.xml
to my schema.xml:

  




  
  




  


To be complete: this the definition of the title fieldtype: 

  

  




   

 


title field value="Smartphone"

With this searchquery I dont get any results:
http://localhost:8983/solr/zz/select/?indent=on&facet=true&q=*smart*&defType=dismax&qf=title_search^20.0&start=0&rows=30&fl=id,title&facet.mincount=1

What more can I do?
Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-tokenization-that-finds-part-of-stringvalue-tp3785366p3804979.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: XSLT Response Writer and content transformation

2012-03-06 Thread darul
also tried :

  

  

to get my description content processed, but no success until now ;)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/XSLT-Response-Writer-and-content-transformation-tp3800251p3804528.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: XSLT Response Writer and content transformation

2012-03-06 Thread darul
Well, by default solr distribution is using xalan ? 

- I have created my custom class mypackage.XsltCustomFunctions, with my
custom method processContent(), put jar in jetty/lib root directory.
- update my rss.xsl file by adding
xmlns:ev="xalan://mypackage.XsltCustomFunctions" in header
- try this syntax : 

get this nice exception java.lang.RuntimeException: getTransformer fails in
getContentType

I not get solr dev environnement to see where it comes from, have you any
idea. 

I do not think it is a classpath problem but a syntax one.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/XSLT-Response-Writer-and-content-transformation-tp3800251p3804473.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to Index Custom XML structure

2012-03-06 Thread Anupam Bhattacharya
Thanks Erick, for the prompt response,

Both the suggestions will be useful for a one time indexing activity. Since
DIH will be one time process of indexing the repository thus it is of no
use in my case.Writing a standalone Java program utilizing SolrJ will again
be a one time indexing process.

I want to write a separate Handler which will be called by ManifoldCF Job
to create indexes in SOLR. In my case the repository is Documentum Content
server. I found some relevant link at this url..
https://community.emc.com/docs/DOC-6520 which is quite similar to my
requirement.

I modified the code to parse the XML and added that into the document
properties Although this works fine when i try to test it with my CURL
program with parameters but when the same handler is called from ManifoldCF
job the job gets terminated within few minutes. Not sure the reason for
that. The handler is written similar to /update/extract which is
ExtractingRequestHandler.

Is ExtractingRequestHandler capable of extracting tag name and values using
some of its defined attributes like capture, captureAttr, extractOnly etc ?
which can be added into the document indexes..


On Tue, Feb 28, 2012 at 8:26 AM, Erick Erickson wrote:

> You might be able to do something with the XSL Transformer step in DIH.
>
> It might also be easier to just write a SolrJ program to parse the XML and
> construct a SolrInputDocument to send to Solr. It's really pretty
> straightforward.
>
> Best
> Erick
>
> On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya
>  wrote:
> > Hi,
> >
> > I am using ManifoldCF to Crawl data from Documentum repository. I am able
> > to successfully read the metadata/properties for the defined document
> types
> > in Documentum using the out-of-the box Documentum Connector in
> ManifoldCF.
> > Unfortunately, there is one XML file also present which consists of a
> > custom XML structure which I need to read and fetch the element values
> and
> > add it for indexing in lucene through SOLR.
> >
> > Is there any mechanism to index any XML structure document in SOLR ?
> >
> > I checked the SOLR CELL framework which support below stucture..
> >
> > 
> >  
> >9885A004
> >Canon PowerShot SD500
> >camera
> >3x optical zoom
> >aluminum case
> >6.4
> >329.95
> >  
> >  
> >9885A003
> >Canon PowerShot SD504
> >camera1
> >3x optical zoom1
> >aluminum case1
> >6.41
> >329.956
> >  
> > 
> >
> > & my Custom XML structure is of the following format.. from which I need
> to
> > read *subject *& *abstract *field for indexing. I checked TIKA project
> but
> > I couldn't find any useful stuff.
> >
> > 
> > 
> > 1
> > This is an abstract.
> > Text Subject
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > Appreciate any help on this.
> >
> > Regards
> > Anupam
>



-- 
Thanks & Regards
Anupam Bhattacharya


Re: Highlighting Multivalued Field question

2012-03-06 Thread Jamie Johnson
so my mistake on this, I was not setting hl.snippets so the default
value of 1 was being used.  If I change to 2 I get the expected
result.

On Tue, Mar 6, 2012 at 9:10 AM, Jamie Johnson  wrote:
> as an FYI I tried this with the standard highlighter and got the same
> result.  Additionally if it matters this is using the following text
> field definition
>
>  positionIncrementGap="100" autoGeneratePhraseQueries="true">
>      
>        
>        
>        
>                        ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>         generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        
>         protected="protwords.txt"/>
>        
>      
>      
>        
>         synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                        ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>         generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        
>         protected="protwords.txt"/>
>        
>      
>    
>
> On Mon, Mar 5, 2012 at 5:42 PM, Jamie Johnson  wrote:
>> If I have a multivalued field with values as follows
>>
>> black pantswhite shirt
>>
>> and I do a query against that field with highlighting enabled as follows
>>
>> /select?hl.fl=clothing&rows=5&q=clothing:black 
>> clothing:shirt&hl=on&indent=true
>>
>> I thought I would see the following in the highlights
>>
>> black pantswhite
>> shirt
>>
>> but instead I'm seeing the following
>>
>> black pants
>>
>> is this expected?
>>
>> Also I'm using a custom highlighter which extends SolrHighlighter but
>> 99.9% of it is a straight copy of DefaultSolrHighlighter with support
>> from pulling unstored fields from an external data base, so I expect
>> that this works the same was as DefaultSolrHighlighter, but if this is
>> not the expected case I will try with DefaultSolrHighlighter.


Re: schema design help

2012-03-06 Thread Gora Mohanty
On 6 March 2012 18:01, Abhishek tiwari  wrote:
> i am new in solr  want help in shema design .  i have multiple entities
> like Event , Establishments and Movies ..each have different types of
> relations.. should i make diffrent core for each entities ?

It depends on your use case, i.e., what would your typical searches
be on. Normally, using a separate core for each entity would be
unusual, and instead one would flatten out typical RDBMS data for
Solr.

Please describe what you want to achieve, and people might be
better able to help you.

Regards,
Gora


Re: Highlighting Multivalued Field question

2012-03-06 Thread Jamie Johnson
as an FYI I tried this with the standard highlighter and got the same
result.  Additionally if it matters this is using the following text
field definition


  








  
  







  


On Mon, Mar 5, 2012 at 5:42 PM, Jamie Johnson  wrote:
> If I have a multivalued field with values as follows
>
> black pantswhite shirt
>
> and I do a query against that field with highlighting enabled as follows
>
> /select?hl.fl=clothing&rows=5&q=clothing:black 
> clothing:shirt&hl=on&indent=true
>
> I thought I would see the following in the highlights
>
> black pantswhite
> shirt
>
> but instead I'm seeing the following
>
> black pants
>
> is this expected?
>
> Also I'm using a custom highlighter which extends SolrHighlighter but
> 99.9% of it is a straight copy of DefaultSolrHighlighter with support
> from pulling unstored fields from an external data base, so I expect
> that this works the same was as DefaultSolrHighlighter, but if this is
> not the expected case I will try with DefaultSolrHighlighter.


Re: need input - lessons learned or best practices for data imports

2012-03-06 Thread Erick Erickson
At this level of control consider using SolrJ instead, especially for
alerts and such. See:

http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Mon, Mar 5, 2012 at 12:29 PM, geeky2  wrote:
> hello all,
>
> we are approaching the time when we will move our first solr core in to a
> more "production like" environment.  as a precursor to this, i am attempting
> to write some documents on impact assessment and batch load / data import
> strategies.
>
> does anyone have processes or lessons learned - that they can share?
>
> maybe a good place to start - but not limited to - would be how do people
> monitor data imports (we are using a very simple DIH hooked to an informix
> schema) and send out appropriate notifications?
>
> thank you for any help or suggestions,
> mark
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/need-input-lessons-learned-or-best-practices-for-data-imports-tp3801327p3801327.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: ngram synonyms & dismax together

2012-03-06 Thread Erick Erickson
Where are synonyms in your ngrammed-field filter chain? It would help
if you showed the two field definitions.

Try using the admin/analysis page to see each step in the transformation,
be sure to check the verbose checkbox.

And examples of the input, and results (&debugQuery=on) would also be
useful.

Best
Erick

On Mon, Mar 5, 2012 at 12:19 PM, Husain, Yavar  wrote:
>
>
> I have ngram-indexed 2 fields (columns in the database) and the third one is 
> my full text field. Now my default text field is the full text field and 
> while querying I use dismax handler and specify in it both the ngrammed field 
> with certain boost values and also full text field with a certain boost value.
>
> Problem for me if I dont use dismax and just search full text field(i.e. 
> default field specified in schema) synonyms work correctly i.e. ca returns 
> all results where california is there whereas if i use dismax ca is also 
> searched in the ngrammed fields and return partial matches of the word ca and 
> does not go at all in the synonym part.
>
> I want to use synonyms in every case so how should I go about it?
> **
> This message may contain confidential or proprietary information intended 
> only for the use of the
> addressee(s) named above or may contain information that is legally 
> privileged. If you are
> not the intended addressee, or the person responsible for delivering it to 
> the intended addressee,
> you are hereby notified that reading, disseminating, distributing or copying 
> this message is strictly
> prohibited. If you have received this message by mistake, please immediately 
> notify us by
> replying to the message and delete the original message and any copies 
> immediately thereafter.
>
> Thank you.-
> **
> FAFLD
>


Re: JoinQuery and document score problem

2012-03-06 Thread Stefan Moises

Hi Martijn,

thanks for your answer - unfortunately we need the JoinQuery for the 
main query, not only for the filters...
Looks like we have to get the scoring for "query-time joins" working on 
our own then, maybe inspired by the "BlockJoin" Query (index-time join) 
functionality (which we cannot use because we can't submit the docs in 
"blocks" with the CSV or DIH importers) - where the scoring seems to 
work exactly as we'd need it.


Cheers,
Stefan

Am 06.03.2012 10:30, schrieb Martijn v Groningen:

Hi Stefan,

The score isn't "moved" from the "from" side to the "to" side and as far as
I know there isn't a way to configure the scoring of the joined documents.
The Solr join query isn't a real join (like in sql) and should be used as
filtering mechanism. The best way is to achieve that is to put the join
query inside a fq parameter.

Martijn

On 5 March 2012 14:01, Stefan Moises  wrote:


Hi list,

we are using the kinda new JoinQuery feature in Solr 4.x Trunk and are
facing a problem (and also Solr 3.5. with the JoinQuery patch applied) ...
We have documents with a parent - child relationship where a parent can
have any number of childs, parents being identified by the field "parentid".

Now after a join (from the field "parentid" to "id") to get the parent
documents only (and to filter out the "variants"/childs of the parent
documents), the document score gets "lost" - all the returned documents
have a score of "1.0" - if we remove the join from the query, the scores
are fine again. Here is an example call:

http://localhost:8983/solr4/**select?qt=dismax&q={!join%**
20from=parentid%20to=id}foo&**fl=id,title,score

All the results now have a score of "1.0", which makes the order of
results pretty much random and the scoring therefore useless... :(
(the same applies for the "standard" query type, so it's not the dismax
parser)

I can't imagine this is "expected" behaviour...? Is there an easy way to
get the "right" scores for the joined documents (e.g. using the max. score
of the childs)? Can the scoring of "joined" documents be configured
somewhere / somehow?

Thanks a lot in advance,
best regards,
Stefan

--
Mit den besten Grüßen aus Nürnberg,
Stefan Moises

*
Stefan Moises
Senior Softwareentwickler
Leiter Modulentwicklung

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
*







--
Mit den besten Grüßen aus Nürnberg,
Stefan Moises

***
Stefan Moises
Senior Softwareentwickler
Leiter Modulentwicklung

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***




Re: How to limit the number of open searchers?

2012-03-06 Thread Li Li
what do u mean "programmatically"? modify codes of solr? becuase solr is
not like lucene, it only provide http interfaces for its users other than
java api

if you want to modify solr, you can find codes in SolrCore
private final LinkedList> _searchers = new
LinkedList>();
and _searcher is current searcher.
be careful to use searcherLock to synchronizing your codes.
maybe you can write your codes like:

synchronized(searcherLock){
if(_searchers.size==1){
...
}
}




On Tue, Mar 6, 2012 at 3:18 AM, Michael Ryan  wrote:

> Is there a way to limit the number of searchers that can be open at a
> given time?  I know there is a maxWarmingSearchers configuration that
> limits the number of warming searchers, but that's not quite what I'm
> looking for...
>
> Ideally, when I commit, I want there to only be one searcher open before
> the commit, so that during the commit and warming, there is a max of two
> searchers open.  I'd be okay with delaying the commit until there is only
> one searcher open.  Is there a way to programmatically determine how many
> searchers are currently open?
>
> -Michael
>


Re: Building a resilient cluster

2012-03-06 Thread Darren Govoni
What I think was mentioned on this a bit ago is that the index stops
working if one of the "nodes" goes down unless its a replica.

You have 2 "nodes" running with numShards=2? Thus if one goes down the
entire index is inoperable. In the future I'm hoping this changes such
that the index cluster continues to operate but will lack results from
the downed node. Maybe this has changed in recent trunk updates though.
Not sure.

On Mon, 2012-03-05 at 20:49 -0800, Ranjan Bagchi wrote:
> Hi Mark,
> 
> So I tried this: started up one instance w/ zookeeper, and started a second
> instance defining a shard name in solr.xml -- it worked, searching would
> search both indices, and looking at the zookeeper ui, I'd see the second
> shard.  However, when I brought the second server down -- the first one
> stopped working:  it didn't kick the second shard out of the cluster.
> 
> Any way to do this?
> 
> Thanks,
> 
> Ranjan
> 
> 
> > From: Mark Miller 
> > To: solr-user@lucene.apache.org
> > Cc:
> > Date: Wed, 29 Feb 2012 22:57:26 -0500
> > Subject: Re: Building a resilient cluster
> > Doh! Sorry - this was broken - I need to fix the doc or add it back.
> >
> > The shard id is actually set in solr.xml since its per core - the sys prop
> > was a sugar option we had setup. So either add 'shard' to the core in
> > solr.xml, or to make it work like it does in the doc, do:
> >
> >  
> >
> > That sets shard to the 'shard' system property if its set, or as a default,
> > act as if it wasn't set.
> >
> > I've been working with custom shard ids mainly through solrj, so I hadn't
> > noticed this.
> >
> > - Mark
> >
> > On Wed, Feb 29, 2012 at 10:36 AM, Ranjan Bagchi  > >wrote:
> >
> > > Hi,
> > >
> > > At this point I'm ok with one zk instance being a point of failure, I
> > just
> > > want to create sharded solr instances, bring them into the cluster, and
> > be
> > > able to shut them down without bringing down the whole cluster.
> > >
> > > According to the wiki page, I should be able to bring up new shard by
> > using
> > > shardId [-D shardId], but when I did that, the logs showed it replicating
> > > an existing shard.
> > >
> > > Ranjan
> > > Andre Bois-Crettez wrote:
> > >
> > > > You have to run ZK on a at least 3 different machines for fault
> > > > tolerance (a ZK ensemble).
> > > >
> > > >
> > >
> > http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_sha=
> > > > rd_replicas_and_zookeeper_ensemble
> > > >
> > > > Ranjan Bagchi wrote:
> > > > > Hi,
> > > > >
> > > > > I'm interested in setting up a solr cluster where each machine [at
> > > least
> > > > > initially] hosts a separate shard of a big index [too big to sit on
> > the
> > > > > machine].  I'm able to put a cloud together by telling it that I have
> > > (to
> > > > > start out with) 4 nodes, and then starting up nodes on 3 machines
> > > > pointin=
> > > > g
> > > > > at the zkInstance.  I'm able to load my sharded data onto each
> > machine
> > > > > individually and it seems to work.
> > > > >
> > > > > My concern is that it's not fault tolerant:  if one of the
> > > non-zookeeper
> > > > > machines falls over, the whole cluster won't work.  Also, I can't
> > > create
> > > > =
> > > > a
> > > > > shard with more data, and have it work within the existing cloud.
> > > > >
> > > > > I tried using -DshardId=3Dshard5 [on an existing 4-shard cluster],
> > but
> > > it
> > > > > just started replicating, which doesn't seem right.
> > > > >
> > > > > Are there ways around this?
> > > > >
> > > > > Thanks,
> > > > > Ranjan Bagchi
> > > > >
> > > > >
> > >
> >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >




Re: Filter facet_fields with Solr similar to stopwords

2012-03-06 Thread Daniel Brügge
OK, I've found this posting from 2009:

http://lucene.472066.n3.nabble.com/excluding-certain-terms-from-facet-counts-when-faceting-based-on-indexed-terms-of-a-field-td501104.html

But this

facet.field={!terms=WORDTOEXCLUDE}content

approach also only shows me the counting of the word I want to exclude.

On Tue, Mar 6, 2012 at 11:33 AM, Daniel Brügge <
daniel.brue...@googlemail.com> wrote:

> Hi,
>
> I am using a solr.StopFilterFactory in a query filter for a text_general
> field (here: content). It works fine, when I query the field for the
> stopword, then I am getting no results.
>
> But I am also doing a facet.field=content call to get the words which are
> used in the text. What I am trying to achieve is, to also filter the
> stopwords from the facet_fields, but it's not working. It would only work
> if the stopwords are also used during the indexing of the text_general
> field, right?
>
> The problem here is, that it's too much data to re-index every time I add
> a new stopword.
>
> My current solution is to 'filter' with code after retrieving the
> facet_fields from Solr. But is there a Solr-based way to do this niftier?
>
> Thanks & regards
>
> Daniel
>


Re: solr out of memory

2012-03-06 Thread Daniel Brügge
Maybe the index is to big and you need to add more memory to the JVM via
the -Xmx parameter. See also
http://wiki.apache.org/solr/SolrPerformanceFactors#OutOfMemoryErrors

Daniel

On Tue, Mar 6, 2012 at 10:01 AM, C.Yunqin <345804...@qq.com> wrote:

> sometimes when i search  a simple  word ,like "id: chenm"
> the solr report eror:
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>
>
> i do not know why?
> sometime the query goes on well.
> anyone have an ideal of that?
>
>
> thanks a lot


Filter facet_fields with Solr similar to stopwords

2012-03-06 Thread Daniel Brügge
Hi,

I am using a solr.StopFilterFactory in a query filter for a text_general
field (here: content). It works fine, when I query the field for the
stopword, then I am getting no results.

But I am also doing a facet.field=content call to get the words which are
used in the text. What I am trying to achieve is, to also filter the
stopwords from the facet_fields, but it's not working. It would only work
if the stopwords are also used during the indexing of the text_general
field, right?

The problem here is, that it's too much data to re-index every time I add a
new stopword.

My current solution is to 'filter' with code after retrieving the
facet_fields from Solr. But is there a Solr-based way to do this niftier?

Thanks & regards

Daniel


Re: JoinQuery and document score problem

2012-03-06 Thread Martijn v Groningen
Hi Stefan,

The score isn't "moved" from the "from" side to the "to" side and as far as
I know there isn't a way to configure the scoring of the joined documents.
The Solr join query isn't a real join (like in sql) and should be used as
filtering mechanism. The best way is to achieve that is to put the join
query inside a fq parameter.

Martijn

On 5 March 2012 14:01, Stefan Moises  wrote:

> Hi list,
>
> we are using the kinda new JoinQuery feature in Solr 4.x Trunk and are
> facing a problem (and also Solr 3.5. with the JoinQuery patch applied) ...
> We have documents with a parent - child relationship where a parent can
> have any number of childs, parents being identified by the field "parentid".
>
> Now after a join (from the field "parentid" to "id") to get the parent
> documents only (and to filter out the "variants"/childs of the parent
> documents), the document score gets "lost" - all the returned documents
> have a score of "1.0" - if we remove the join from the query, the scores
> are fine again. Here is an example call:
>
> http://localhost:8983/solr4/**select?qt=dismax&q={!join%**
> 20from=parentid%20to=id}foo&**fl=id,title,score
>
> All the results now have a score of "1.0", which makes the order of
> results pretty much random and the scoring therefore useless... :(
> (the same applies for the "standard" query type, so it's not the dismax
> parser)
>
> I can't imagine this is "expected" behaviour...? Is there an easy way to
> get the "right" scores for the joined documents (e.g. using the max. score
> of the childs)? Can the scoring of "joined" documents be configured
> somewhere / somehow?
>
> Thanks a lot in advance,
> best regards,
> Stefan
>
> --
> Mit den besten Grüßen aus Nürnberg,
> Stefan Moises
>
> *
> Stefan Moises
> Senior Softwareentwickler
> Leiter Modulentwicklung
>
> shoptimax GmbH
> Guntherstraße 45 a
> 90461 Nürnberg
> Amtsgericht Nürnberg HRB 21703
> GF Friedrich Schreieck
>
> Tel.: 0911/25566-25
> Fax:  0911/25566-29
> moi...@shoptimax.de
> http://www.shoptimax.de
> *
>
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen


Apache Lucene Eurocon 2012

2012-03-06 Thread Vadim Kisselmann
Hi folks,

where and when is the next Eurocon scheduled?
I read something about denmark and autumn 2012(i don't know where *g*).

Best regards and thanks
Vadim