Re: Fault tolerant Solr replication architecture

2012-05-21 Thread findbestopensource
Hi Parvin,

Fault tolerant architecture is something you need to decide on your
requirement. At some point of time there may require some manual
intervention to recover from crash. You need to see how much percentage you
could support fault tolerant. It certainly may not be 100. We could handle
situation of network failure but hard to handle situation of crashes.

Consider you have one master and two slaves. You could have load balancer
between slaves, so that you could do round-robin or fail-over between
slaves. If you are not using load balancer then you should handle this in
your application.

If the master crashes, then you may need to rebuild the index. Chances are
less likely.

Regards
Aditya
www.findbestopensource.com



On Mon, May 21, 2012 at 12:55 PM, Parvin Gasimzade 
parvin.gasimz...@gmail.com wrote:

 Hi,

 I am using solr with replication. I have one master that indexes data and
 two slaves which pulls index from master and responds to the queries.

 My question is, how can i create fault tolerant architecture? I mean what
 should i do when master server crashes? I heard that repeater is used for
 this type of architecture. Then, do I have to create one master, one slave
 with repeater and one slave?

 Another question is, if master crashes then does slave with repeater start
 indexing authomatically or should i configure it manually?

 I asked similar question on the stackoverflow :

 http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture

 Any help will be appreciated.

 Regards,
 Parvin



Re: using Carrot2 custom ITokenizerFactory

2012-05-21 Thread Stanislaw Osinski
Hi Koji,

Dawid came up with a simple fix for this, it's committed to trunk and 3.6
branch.

Staszek

On Sun, May 20, 2012 at 5:15 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 Hi Staszek,

 Thank you for the fix so quickly!

 As a trial, I set:

 str name=PreprocessingPipeline.**tokenizerFactoryorg.apache.**
 solr.handler.clustering.**carrot2.**LuceneCarrot2TokenizerFactory**/str

 then I could start Solr without error. But when I make a request:

 http://localhost:8983/solr/**clustering?q=*%3A*version=2.**
 2start=0rows=10indent=on**wt=jsonfl=idcarrot.**produceSummary=falsehttp://localhost:8983/solr/clustering?q=*%3A*version=2.2start=0rows=10indent=onwt=jsonfl=idcarrot.produceSummary=false

 I got an exception:

 org.apache.solr.common.**SolrException: Carrot2 clustering failed
at org.apache.solr.handler.**clustering.carrot2.**
 CarrotClusteringEngine.**cluster(**CarrotClusteringEngine.java:**224)
at org.apache.solr.handler.**clustering.**
 ClusteringComponent.process(**ClusteringComponent.java:91)
at org.apache.solr.handler.**component.SearchHandler.**
 handleRequestBody(**SearchHandler.java:186)
at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
 RequestHandlerBase.java:129)
at org.apache.solr.core.**RequestHandlers$**
 LazyRequestHandlerWrapper.**handleRequest(RequestHandlers.**java:244)
at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1376)
at org.apache.solr.servlet.**SolrDispatchFilter.execute(**
 SolrDispatchFilter.java:365)
at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:260)
at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
 doFilter(ServletHandler.java:**1212)
at org.mortbay.jetty.servlet.**ServletHandler.handle(**
 ServletHandler.java:399)
at org.mortbay.jetty.security.**SecurityHandler.handle(**
 SecurityHandler.java:216)
at org.mortbay.jetty.servlet.**SessionHandler.handle(**
 SessionHandler.java:182)
at org.mortbay.jetty.handler.**ContextHandler.handle(**
 ContextHandler.java:766)
at org.mortbay.jetty.webapp.**WebAppContext.handle(**
 WebAppContext.java:450)
at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(**
 ContextHandlerCollection.java:**230)
at org.mortbay.jetty.handler.**HandlerCollection.handle(**
 HandlerCollection.java:114)
at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
 HandlerWrapper.java:152)
at org.mortbay.jetty.Server.**handle(Server.java:326)
at org.mortbay.jetty.**HttpConnection.handleRequest(**
 HttpConnection.java:542)
at org.mortbay.jetty.**HttpConnection$RequestHandler.**
 headerComplete(HttpConnection.**java:928)
at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
 java:212)
at org.mortbay.jetty.**HttpConnection.handle(**
 HttpConnection.java:404)
at org.mortbay.jetty.bio.**SocketConnector$Connection.**
 run(SocketConnector.java:228)
at org.mortbay.thread.**QueuedThreadPool$PoolThread.**
 run(QueuedThreadPool.java:582)
 Caused by: org.carrot2.core.**ComponentInitializationExcepti**on:
 org.carrot2.util.attribute.**AttributeBindingException: Could not assign
 field org.carrot2.text.**preprocessing.pipeline.**
 CompletePreprocessingPipeline#**tokenizerFactory with value
 org.apache.solr.handler.**clustering.carrot2.**
 LuceneCarrot2TokenizerFactory
at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
 Method)
at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
 NativeConstructorAccessorImpl.**java:39)
at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(**
 DelegatingConstructorAccessorI**mpl.java:27)
at java.lang.reflect.Constructor.**newInstance(Constructor.java:**
 513)
at org.carrot2.util.**ExceptionUtils.wrapAs(**
 ExceptionUtils.java:63)
at org.carrot2.core.**PoolingProcessingComponentMana**ger$**
 ComponentInstantiationListener**.objectInstantiated(**
 PoolingProcessingComponentMana**ger.java:234)
at org.carrot2.core.**PoolingProcessingComponentMana**ger$**
 ComponentInstantiationListener**.objectInstantiated(**
 PoolingProcessingComponentMana**ger.java:169)
at org.carrot2.util.pool.**SoftUnboundedPool.**borrowObject(**
 SoftUnboundedPool.java:83)
at org.carrot2.core.**PoolingProcessingComponentMana**ger.prepare(*
 *PoolingProcessingComponentMana**ger.java:128)
at org.carrot2.core.Controller.**process(Controller.java:333)
at org.carrot2.core.Controller.**process(Controller.java:240)
at org.apache.solr.handler.**clustering.carrot2.**
 CarrotClusteringEngine.**cluster(**CarrotClusteringEngine.java:**220)
... 24 more
 Caused by: org.carrot2.util.attribute.**AttributeBindingException: Could
 not assign field org.carrot2.text.**preprocessing.pipeline.**
 

Re: No Effect of omitNorms and omitTermFreqAndPositions when using MLT handler?

2012-05-21 Thread Ravish Bhagdev
Ahh, this is because I have to override DefaultSimilarity to turn off
tf/idf scoring?  But this will apply to all the fields and general search
on text fields as well?  Is there a way to apply custom similarity to
specific field types or fields only?  Is there no way of turning TF/IDF off
without this?

Thanks,
Ravish

On Mon, May 21, 2012 at 10:24 AM, Ravish Bhagdev
ravish.bhag...@gmail.comwrote:

 Hi All,

 I was wondering if omitNorms will have any effect on MLT handler at all?

 I'm using schema version 1.2 with Solr 1.4 and have defined couple of
 fields, which I want to use for MLT lookup and don't want factors like
 field length or TF/IDF to affect the scores.  The definitions are as below:

  fieldType name=lowercase class=solr.TextField
 positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true
  analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
 /analyzer
  /fieldType

 fieldType name=text_nonorms class=solr.TextField
 positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 /
  filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory protected=protwords.txt
 /
  filter class=solr.RemoveDuplicatesTokenFilterFactory /
 /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 /
  filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory protected=protwords.txt
 /
  filter class=solr.RemoveDuplicatesTokenFilterFactory /
 /analyzer
  /fieldType

 !-- and the fields that use the above field types are --
  field name=PROFILE_TAGS type=lowercase indexed=true stored=true
 multiValued=true termVectors=true/
  field name=PROFILE_TAGS_TXT type=text_nonorms indexed=true
 stored=true multiValued=true termVectors=true/

 In My solrconfig.xml I have defined following for my MLT request handler:

   requestHandler name=/mlt class=solr.MoreLikeThisHandler
  lst name=defaults
 str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str
  str name=mlt.qfPROFILE_TAGS^10.0 PROFILE_TAGS_TXT^2.0/str
 int name=mlt.mindf1/int
  int name=mlt.mintf1/int
 str name=flid,score/str
  str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str
 /lst
   /requestHandler


 However, when I run my query as follows:

 http://localhost:9090/solr/mlt?fl=*,scorestart=0q=id:4417454.matchRecordqt=/mltfq=targetDB:ConnectMeDBrows=1000debugQuery=on

 the debug scoring info shows following:

 str name=5042172.matchRecord
 0.17156276 = (MATCH) product of:
   1.4296896 = (MATCH) sum of:
 0.24737607 = (MATCH) weight(PROFILE_TAGS_TXT:system^5.0 in 1472),
 product of:
   0.06376338 = queryWeight(PROFILE_TAGS_TXT:system^5.0), product of:
 5.0 = boost
 3.8795946 = idf(docFreq=538, maxDocs=9598)
 0.0032871156 = queryNorm
   3.8795946 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:system in 1472),
 product of:
 1.0 = tf(termFreq(PROFILE_TAGS_TXT:system)=1)
 3.8795946 = idf(docFreq=538, maxDocs=9598)
 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
 0.65193653 = (MATCH) weight(PROFILE_TAGS_TXT:adapt^5.0 in 1472),
 product of:
   0.10351306 = queryWeight(PROFILE_TAGS_TXT:adapt^5.0), product of:
 5.0 = boost
 6.298109 = idf(docFreq=47, maxDocs=9598)
 0.0032871156 = queryNorm
   6.298109 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:adapt in 1472),
 product of:
 1.0 = tf(termFreq(PROFILE_TAGS_TXT:adapt)=1)
 6.298109 = idf(docFreq=47, maxDocs=9598)
 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
 0.530377 = (MATCH) weight(PROFILE_TAGS_TXT:optic^5.0 in 1472), product
 of:
   0.093365155 = queryWeight(PROFILE_TAGS_TXT:optic^5.0), product of:
 5.0 = boost
 5.6806736 = idf(docFreq=88, maxDocs=9598)
 0.0032871156 = queryNorm
   5.6806736 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:optic in 1472),
 product of:
 1.0 = tf(termFreq(PROFILE_TAGS_TXT:optic)=1)
 5.6806736 = idf(docFreq=88, maxDocs=9598)
 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
   0.12 = coord(3/25)
 /str

 Which seems to suggest that the TF/IDF is being performed on these fields!
  Also, does it make any difference if I specify omitNorms in field
 definition vs specifying in fieldType definition?

 I will appreciate any help with this.

 Thanks,
 Ravish



org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga

Hi,

I am getting this error:

[doc=null] missing required field: id

request: http://localhost:8983/solr/update?wt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)

at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2012-05-21 11:44:29,953 ERROR solr.SolrIndexer - java.io.IOException: 
Job failed!


I've got this entry in schema.xml: field name=id type=string 
stored=true indexed=true/

What to do?

Regards,


Re: No Effect of omitNorms and omitTermFreqAndPositions when using MLT handler?

2012-05-21 Thread Ravish Bhagdev
I found this:

https://issues.apache.org/jira/browse/LUCENE-2236

So, it seems this feature is not supported in Solr 1.4 at all.  Is there
any possible work around?  If not, I'll have to consider splitting my
schema into two which will be quite a big change :(

- Ravish

On Mon, May 21, 2012 at 11:03 AM, Ravish Bhagdev
ravish.bhag...@gmail.comwrote:

 Ahh, this is because I have to override DefaultSimilarity to turn off
 tf/idf scoring?  But this will apply to all the fields and general search
 on text fields as well?  Is there a way to apply custom similarity to
 specific field types or fields only?  Is there no way of turning TF/IDF off
 without this?

 Thanks,
 Ravish


 On Mon, May 21, 2012 at 10:24 AM, Ravish Bhagdev ravish.bhag...@gmail.com
  wrote:

 Hi All,

 I was wondering if omitNorms will have any effect on MLT handler at all?

 I'm using schema version 1.2 with Solr 1.4 and have defined couple of
 fields, which I want to use for MLT lookup and don't want factors like
 field length or TF/IDF to affect the scores.  The definitions are as below:

  fieldType name=lowercase class=solr.TextField
 positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true
  analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
 /analyzer
  /fieldType

 fieldType name=text_nonorms class=solr.TextField
 positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 /
  filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory protected=protwords.txt
 /
  filter class=solr.RemoveDuplicatesTokenFilterFactory /
 /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 /
  filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory protected=protwords.txt
 /
  filter class=solr.RemoveDuplicatesTokenFilterFactory /
 /analyzer
  /fieldType

 !-- and the fields that use the above field types are --
  field name=PROFILE_TAGS type=lowercase indexed=true
 stored=true multiValued=true termVectors=true/
  field name=PROFILE_TAGS_TXT type=text_nonorms indexed=true
 stored=true multiValued=true termVectors=true/

 In My solrconfig.xml I have defined following for my MLT request handler:

   requestHandler name=/mlt class=solr.MoreLikeThisHandler
  lst name=defaults
 str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str
  str name=mlt.qfPROFILE_TAGS^10.0 PROFILE_TAGS_TXT^2.0/str
 int name=mlt.mindf1/int
  int name=mlt.mintf1/int
 str name=flid,score/str
  str name=mlt.flPROFILE_TAGS,PROFILE_TAGS_TXT/str
 /lst
   /requestHandler


 However, when I run my query as follows:

 http://localhost:9090/solr/mlt?fl=*,scorestart=0q=id:4417454.matchRecordqt=/mltfq=targetDB:ConnectMeDBrows=1000debugQuery=on

 the debug scoring info shows following:

 str name=5042172.matchRecord
 0.17156276 = (MATCH) product of:
   1.4296896 = (MATCH) sum of:
 0.24737607 = (MATCH) weight(PROFILE_TAGS_TXT:system^5.0 in 1472),
 product of:
   0.06376338 = queryWeight(PROFILE_TAGS_TXT:system^5.0), product of:
 5.0 = boost
 3.8795946 = idf(docFreq=538, maxDocs=9598)
 0.0032871156 = queryNorm
   3.8795946 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:system in 1472),
 product of:
 1.0 = tf(termFreq(PROFILE_TAGS_TXT:system)=1)
 3.8795946 = idf(docFreq=538, maxDocs=9598)
 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
 0.65193653 = (MATCH) weight(PROFILE_TAGS_TXT:adapt^5.0 in 1472),
 product of:
   0.10351306 = queryWeight(PROFILE_TAGS_TXT:adapt^5.0), product of:
 5.0 = boost
 6.298109 = idf(docFreq=47, maxDocs=9598)
 0.0032871156 = queryNorm
   6.298109 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:adapt in 1472),
 product of:
 1.0 = tf(termFreq(PROFILE_TAGS_TXT:adapt)=1)
 6.298109 = idf(docFreq=47, maxDocs=9598)
 1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
 0.530377 = (MATCH) weight(PROFILE_TAGS_TXT:optic^5.0 in 1472),
 product of:
   0.093365155 = queryWeight(PROFILE_TAGS_TXT:optic^5.0), product of:
 5.0 = boost
 5.6806736 = idf(docFreq=88, maxDocs=9598)
 0.0032871156 = queryNorm
   5.6806736 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:optic in 1472),
 product of:
 1.0 = tf(termFreq(PROFILE_TAGS_TXT:optic)=1)
 5.6806736 = idf(docFreq=88, maxDocs=9598)
 1.0 = 

Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Michael Kuhlmann

Am 21.05.2012 12:07, schrieb Tolga:

Hi,

I am getting this error:

[doc=null] missing required field: id


[...]


I've got this entry in schema.xml: field name=id type=string
stored=true indexed=true/
What to do?


Simply make sure that every document you're sending to Solr contains 
this id field.


I assume it's declared as your unique id field, so it's mandatory.

Greetings,
Kuli



Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga
How do I verify it exists? I've been crawling the same site and it 
wasn't giving an error on Thursday.


Regards,

On 5/21/12 1:20 PM, Michael Kuhlmann wrote:

Am 21.05.2012 12:07, schrieb Tolga:

Hi,

I am getting this error:

[doc=null] missing required field: id


[...]


I've got this entry in schema.xml: field name=id type=string
stored=true indexed=true/
What to do?


Simply make sure that every document you're sending to Solr contains 
this id field.


I assume it's declared as your unique id field, so it's mandatory.

Greetings,
Kuli



Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Michael Kuhlmann

Am 21.05.2012 12:40, schrieb Tolga:

How do I verify it exists? I've been crawling the same site and it
wasn't giving an error on Thursday.


It depends on what you're doing.

Are you using nutch?

-Kuli


Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Tolga

Yes.

On 5/21/12 1:49 PM, Michael Kuhlmann wrote:

Am 21.05.2012 12:40, schrieb Tolga:

How do I verify it exists? I've been crawling the same site and it
wasn't giving an error on Thursday.


It depends on what you're doing.

Are you using nutch?

-Kuli


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Ahmet Arslan
 text:abstracthl=truehl.fl=textf.text.hl.snippets=2f.text.hl.fragsize=200debugQuery=true

Three things to check:

1-) See your text field declared as suitable for highlighting.
http://wiki.apache.org/solr/FieldOptionsByUseCase

2-) Increase hl.maxAnalyzedChars=Integer.MAX 

3-) Increase  maxFieldLengthInteger.MAX/maxFieldLength

For some reason (complex analysis etc) snippets cannot be always generated. For 
this cases consider using hl.alternateField and hl.maxAlternateFieldLength


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Jack Krupansky
Take a look at the /browse request handler in the example solrconfig.xml 
and compare how it does highlighting to what you are doing. There are a lot 
of little details, so maybe even one might be missing.


Also, you can only highlight stored fields, so make sure that text is 
stored. In the Solr example it is not stored and not intended to be stored, 
and highlighting should be performed using some other field containing the 
text as a stored field.


-- Jack Krupansky

-Original Message- 
From: 12rad

Sent: Sunday, May 20, 2012 11:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Not able to use the highlighting feature! Want to return 
snippets of text


My query parameters are this:

text:abstracthl=truehl.fl=textf.text.hl.snippets=2f.text.hl.fragsize=200debugQuery=true

I still get the entire string as the result in the
lst name=highlighting tag.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985022.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Facing problem to integrate UIMA in SOLR

2012-05-21 Thread dsy99
Hello all,
I am facing problem to integrate the UIMA in SOLR.

I followed the following steps, provided in README file shipped along with
Uima to integrate it in Solr

Step1.
I set lib/ tags in solrconfig.xml appropriately to point the jar files.

   lib dir=../../contrib/uima/lib /
   lib dir=../../dist/ regex=apache-solr-uima-\d.*\.jar /

Step2.
 modified my schema.xml adding the fields I wanted to  hold metadata
specifying proper values for type, indexed, stored and multiValued options
as follows:

field name=language type=string indexed=true stored=true
required=false/
  field name=concept type=string indexed=true stored=true
multiValued=true required=false/
  field name=sentence type=text indexed=true stored=true
multiValued=true required=false /

Step3.
modified my solrconfig.xml adding the following snippet:

  updateRequestProcessorChain name=uima default=true
processor
class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory
  lst name=uimaConfig
lst name=runtimeParameters
  str name=keyword_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=concept_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=lang_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=cat_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=entities_apikeyVALID_ALCHEMYAPI_KEY/str
  str name=oc_licenseIDVALID_OPENCALAIS_KEY/str
/lst
str
name=analysisEngine/org/apache/uima/desc/OverridingParamsExtServicesAE.xml/str
   
bool name=ignoreErrorstrue/bool
   
lst name=analyzeFields
  bool name=mergefalse/bool
  arr name=fields
strtext/str
  /arr
/lst
lst name=fieldMappings
  lst name=type
str
name=nameorg.apache.uima.alchemy.ts.concept.ConceptFS/str
lst name=mapping
  str name=featuretext/str
  str name=fieldconcept/str
/lst
  /lst
  lst name=type
str
name=nameorg.apache.uima.alchemy.ts.language.LanguageFS/str
lst name=mapping
  str name=featurelanguage/str
  str name=fieldlanguage/str
/lst
  /lst
  lst name=type
str name=nameorg.apache.uima.SentenceAnnotation/str
lst name=mapping
  str name=featurecoveredText/str
  str name=fieldsentence/str
/lst
  /lst
/lst
  /lst
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

Step 4:
And finally created a new UpdateRequestHandler with the following:
  requestHandler name=/update class=solr.XmlUpdateRequestHandler
lst name=defaults
  str name=update.processoruima/str
/lst

Further I  indexed a word file called text.docx using the following command:

curl
http://localhost:8983/solr/update/extract?fmap.content=contentliteral.id=doc47commit=true;
-F file=@test.docx

When I searched the same document with
http://localhost:8983/solr/select?q=id:doc47; command, got the following
result i.e. not getting the additional UIMA fields in the response.

result name=response numFound=1 start=0
  doc
 str name=authordivakar/str
 arr name=content_type
str
  
application/vnd.openxmlformats-officedocument.wordprocessingml.document
/str
 /arr
 str name=iddoc47/str
 date name=last_modified2012-04-18T14:19:00Z/date
  /doc
/result


Can anyone how to solve this problem?

With Regds  Thanks
Divakar 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facing-problem-to-integrate-UIMA-in-SOLR-tp3985089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Searching MySQL table with Hindi and English data

2012-05-21 Thread KP Sanjailal
Hi,

Thank you so much for replying.

The MySQL database server is running on a Fedora Core 12 Machine with Hindi
Language Support enabled.  Details of the database are - ENGINE=MyISAM and
 DEFAULT CHARSET=utf8

Data is imported using the Solr DataImportHandler (mysql jdbc driver).
In the schema.xml file the title field is defined as:
field name=title type=text_general indexed=true stored=true/

I tried saving the query results directly to a text file from the MySQL
command prompt but it is not storing the results correctly.  The file
contains the following characters.


à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

Please suggest what I have to do to solve this issue.

Regards,

Sanjailal KP
--



On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote:

 Also, try saving data from a query into a file and verify that it is
 UTF-8 and the characters are correct.

 On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com
 wrote:
  Check the analyzers for the field types containing Hindi text to be sure
  that they are not using a character mapping or folding filter that
 might
  mangle the Hindi characters. Post the field type, say for the title
 field.
 
  Also, try manually (using curl or the post jar) adding a single document
  that has Hindi data and see if that works.
 
  -- Jack Krupansky
 
  -Original Message- From: KP Sanjailal
  Sent: Thursday, May 17, 2012 5:55 AM
  To: solr-user@lucene.apache.org
  Subject: Indexing  Searching MySQL table with Hindi and English data
 
 
  Hi,
 
  I tried to setup indexing of MySQL tables in Apache Solr 3.6.
 
  Everything works fine but text in Hindi script (only some 10% of total
  records) not getting indexed properly.
 
  A search with keyword in Hindi retrieve emptly result set.  Also a
  retrieved hindi record displays junk characters.
 
  The database tables contains bibliographical details of books such as
  title, author, publisher, isbn, publishing place, series etc. and out of
  the total records about 10% of records contains text in Hindi in title,
  author, publisher fields.
 
  Example:
 
  *Search Results from MySQL using PHP*
 
   1.
  http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊर्जा Saur
  oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books **  **
  *Search Results from Apache Solr (searched using keyword in English)*
 
   1.
  http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
   *Title:* सौर ऊरॠजा Saur
  oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
  *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK)
 *
  Material:* Books
 
 
  How do I go about solving this language problem.
 
  Thanks in advace.
 
  K. P. Sanjailal
  --
 



 --
 Lance Norskog
 goks...@gmail.com



Re: org.apache.solr.common.SolrException: ERROR: [doc=null] missing required field: id

2012-05-21 Thread Jack Krupansky
Solr appears to force your UniqueKey field to be required even though you 
don't have an explicit required=true attribute.


As a debugging aid, try adding default=missing to your id field 
definition and then you can query on id:missing and see what data is being 
indexed without an id. But, it would be better to examine the input data and 
see why it is missing the id field, since that is the real problem that 
needs to be resolved.


-- Jack Krupansky

-Original Message- 
From: Tolga

Sent: Monday, May 21, 2012 6:07 AM
To: solr-user@lucene.apache.org
Subject: org.apache.solr.common.SolrException: ERROR: [doc=null] missing 
required field: id


Hi,

I am getting this error:

[doc=null] missing required field: id

request: http://localhost:8983/solr/update?wt=javabinversion=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2012-05-21 11:44:29,953 ERROR solr.SolrIndexer - java.io.IOException:
Job failed!

I've got this entry in schema.xml: field name=id type=string
stored=true indexed=true/
What to do?

Regards, 



Re: Indexing Searching MySQL table with Hindi and English data

2012-05-21 Thread Jack Krupansky
Is it possible that your text editor/display does not support UTF-8 
encoding?


Assuming the data is properly encoded, do you have the encoding=UTF-8 
attribute in your DIH dataSource tag?


-- Jack Krupansky

-Original Message- 
From: KP Sanjailal

Sent: Monday, May 21, 2012 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing  Searching MySQL table with Hindi and English data

Hi,

Thank you so much for replying.

The MySQL database server is running on a Fedora Core 12 Machine with Hindi
Language Support enabled.  Details of the database are - ENGINE=MyISAM and
DEFAULT CHARSET=utf8

Data is imported using the Solr DataImportHandler (mysql jdbc driver).
In the schema.xml file the title field is defined as:
field name=title type=text_general indexed=true stored=true/

I tried saving the query results directly to a text file from the MySQL
command prompt but it is not storing the results correctly.  The file
contains the following characters.


à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥8dà ¤Åà ¤¾ Saur oorja

Please suggest what I have to do to solve this issue.

Regards,

Sanjailal KP
--



On Sun, May 20, 2012 at 6:59 AM, Lance Norskog goks...@gmail.com wrote:


Also, try saving data from a query into a file and verify that it is
UTF-8 and the characters are correct.

On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky j...@basetechnology.com
wrote:
 Check the analyzers for the field types containing Hindi text to be sure
 that they are not using a character mapping or folding filter that
might
 mangle the Hindi characters. Post the field type, say for the title
field.

 Also, try manually (using curl or the post jar) adding a single document
 that has Hindi data and see if that works.

 -- Jack Krupansky

 -Original Message- From: KP Sanjailal
 Sent: Thursday, May 17, 2012 5:55 AM
 To: solr-user@lucene.apache.org
 Subject: Indexing  Searching MySQL table with Hindi and English data


 Hi,

 I tried to setup indexing of MySQL tables in Apache Solr 3.6.

 Everything works fine but text in Hindi script (only some 10% of total
 records) not getting indexed properly.

 A search with keyword in Hindi retrieve emptly result set.  Also a
 retrieved hindi record displays junk characters.

 The database tables contains bibliographical details of books such as
 title, author, publisher, isbn, publishing place, series etc. and out of
 the total records about 10% of records contains text in Hindi in title,
 author, publisher fields.

 Example:

 *Search Results from MySQL using PHP*

  1.
 http://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
  *Title:* सौर ऊर्जा Saur
 oorjahttp://192.168.0.132/shared/biblio_view.php?bibid=26913tab=opac
 *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books **  **
 *Search Results from Apache Solr (searched using keyword in English)*

  1.
 http://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
  *Title:* सौर ऊरॠजा Saur
 oorjahttp://192.168.0.132/test/biblio_view.php?bibid=26913tab=opac
 *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK)
*
 Material:* Books


 How do I go about solving this language problem.

 Thanks in advace.

 K. P. Sanjailal
 --




--
Lance Norskog
goks...@gmail.com





Re: problem in replication

2012-05-21 Thread shinkanze
hi Tomas  ,

My queries are complex ,i am faceting on many fields ,and using highlighting
and using boosts etc in the same query .

auto warming takes hell lot of time hence i have removed it . 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-in-replication-tp3984654p3985098.html
Sent from the Solr - User mailing list archive at Nabble.com.


no css on browse UI when multicore

2012-05-21 Thread Aleksander Akerø
Hi

 

The css files from the browse GUI in solr 3.6 does not seem to work properly
when solr is deployed with multiple cores and I can’t figure out how to
solve this. I know this have been an issue in solr but I thought it was
fixed in the newer versions.

 

Any answers or pointers on how to get this fixed is much appreciatedJ

 

Regards,

Aleksander Akerø



boost function parameter (bf) ignores character escaping

2012-05-21 Thread mail

Hey,

I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what  
I think is a bug with the boost function (bf) parameter.


I've used sunspot (for use of solr with rails) which allows managing  
dynamic fields, which by default creates fields like  
dynamicfield:value1,dynamicfield:value2, though using the :  
character in the field name, which needs to be escaped.


If I use a query which includes q=dynamicfield\:value1:6, everything  
works fines and matches are found.


However, if I use the bf field with bf=dynamicfield\:value1, I get an  
error message undefined field dynamicfield, the same without  
escaping the :


Should I file a bug report?

Best,

Nils
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint name=QTime3/intlst name=paramsstr name=start0/strstr name=qexpertise\:solution_database_i:6/strstr name=defTypeedismax/strstr name=rows10/str/lst/lstresult name=response numFound=1 start=0docarr name=full_name_textsstrNils Kaiser/str/arrstr name=idUser 4f32081ccd112e65d36c/str/doc/result
/response

Re: no css on browse UI when multicore

2012-05-21 Thread Erik Hatcher

On May 21, 2012, at 08:11 , Aleksander Akerø wrote:
 The css files from the browse GUI in solr 3.6 does not seem to work properly
 when solr is deployed with multiple cores and I can’t figure out how to
 solve this. I know this have been an issue in solr but I thought it was
 fixed in the newer versions.
 
 
 
 Any answers or pointers on how to get this fixed is much appreciatedJ

Each core has it's own templates, and thus it'll be core dependent.  There is a 
conf/velocity/VM_global_library.vm that has the base path that the other 
templates can use for a base path, and it should look like this:

  #macro(url_for_solr)/solr#if($request.core.name != 
)/$request.core.name#end#end

And the stylesheet is referenced in head.vm like this:

  link rel=stylesheet type=text/css 
href=#{url_for_solr}/admin/file?file=/velocity/main.csscontentType=text/css/

This requires that the file serving handler (/admin/file) be enabled and that 
conf/velocity/main.css exist.

Does that help?   If not, what is the HTML rendered from /browse say as the CSS 
URL?  What error does hitting that URL directly give?

Erik



UI

2012-05-21 Thread Tolga

Hi,

Can you recommend a good PHP UI to search? Is SolrPHPClient good?


Re: boost function parameter (bf) ignores character escaping

2012-05-21 Thread Erik Hatcher
Yeah, a bug report would be good.  But really this is a Sunspot bug report.  
Field names should NOT have :'s in them.  Field names should stick to standard 
Java identifier rules, otherwise it's escaping madness.  

You could try something like this as a workaround:

   bq=_val_:dynamicfield\:value1

I don't know if that'll do better than the bf issue you've hit, but it's 
another way of doing the same sort of thing.

Erik

On May 21, 2012, at 08:01 , m...@nils-kaiser.de wrote:

 Hey,
 
 I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what I 
 think is a bug with the boost function (bf) parameter.
 
 I've used sunspot (for use of solr with rails) which allows managing dynamic 
 fields, which by default creates fields like 
 dynamicfield:value1,dynamicfield:value2, though using the : character in 
 the field name, which needs to be escaped.
 
 If I use a query which includes q=dynamicfield\:value1:6, everything works 
 fines and matches are found.
 
 However, if I use the bf field with bf=dynamicfield\:value1, I get an error 
 message undefined field dynamicfield, the same without escaping the :
 
 Should I file a bug report?
 
 Best,
 
 Nils
 solr_debug_normalquery.xml



Re: boost function parameter (bf) ignores character escaping

2012-05-21 Thread Jack Krupansky

Quoting from the new trunk example schema:

field names should consist of alphanumeric or underscore characters only 
and

 not start with a digit.  This is not currently strictly enforced,
 but other field names will not have first class support from all 
components

 and back compatibility is not guaranteed.

In other words, don't do it. Replace the colon with an underscore in your 
field names.


-- Jack Krupansky

-Original Message- 
From: m...@nils-kaiser.de

Sent: Monday, May 21, 2012 8:01 AM
To: solr-user@lucene.apache.org
Subject: boost function parameter (bf) ignores character escaping

Hey,

I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what
I think is a bug with the boost function (bf) parameter.

I've used sunspot (for use of solr with rails) which allows managing
dynamic fields, which by default creates fields like
dynamicfield:value1,dynamicfield:value2, though using the :
character in the field name, which needs to be escaped.

If I use a query which includes q=dynamicfield\:value1:6, everything
works fines and matches are found.

However, if I use the bf field with bf=dynamicfield\:value1, I get an
error message undefined field dynamicfield, the same without
escaping the :

Should I file a bug report?

Best,

Nils 



RE: Solr Single Core vs Multiple Cores installation for localization

2012-05-21 Thread Ivan Hrytsyuk
We intend to have separate, language specific search UI. 
At the moment we like solution with separate cores more because it is more 
flexible. But as a rule flexibility costs in terms of performance and we would 
like to know that price.

Jack, what did you mean by 'Managing a bunch of small and tiny cores could be a 
pain'? Could you please provide more details.

Thank you for your help, Ivan

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Thursday, May 17, 2012 3:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Single Core vs Multiple Cores installation for localization

First you have to answer the twin questions of what you want the user 
experience to be and what expectations users may have independent of your 
intentions.

Do you intend to have separate, language specific search UI? That would match 
up with separate cores, but can be done with a language type field as well.

Sometimes, users want only documents in a specific language, but sometimes they 
want a globalized search for technical terms or names across all languages, 
such as searching for Lucene OR Solr and then faceting by language to get 
an idea of use by language.

From a practical perspective, maybe most docs would be English, so that would 
be one big core anyway. And the main secondary languages would be modest 
sized, and then you may have a large number of tiny cores. Managing a bunch of 
small and tiny cores could be a pain.

Maybe three cores: English-only, all non-English, and all language - if 
globalized search is desired. The all non-English could have a filter query 
on the specific language desired, or using different field sets for query and 
returned fields in a edismax query request. This is just one technical 
approach, but it still all depends on intended user experience and user 
expectations.

-- Jack Krupansky

-Original Message-
From: Ivan Hrytsyuk
Sent: Wednesday, May 16, 2012 6:31 AM
To: solr-user@lucene.apache.org
Subject: Solr Single Core vs Multiple Cores installation for localization

Hello,

We are going to add multi-language support for our Solr-based project.

We consider next Solr installation types:

1.   Single core - all fields for all languages reside in a single core. 
I.e. title_en, description_en, title_de, description_de, title_fr, 
description_fr

2.   Multiple cores - one core for one language

Looks like Multiple cores installation is more appropriate for multi-language, 
but we would like to see expert comments on this.
What we have found so far for Multiple cores are:

* Pros

o   Searching is faster because there is a linear relationship between index 
size and query response time as the size of index volumes increases

o   More flexible. We can shut-down any core at any time

o   Easier to maintain

* Cons

o   Startup time is bigger in comparison with Single core

Could anyone suggest:

1.   Indexing for Multiple cores will be faster in comparison to Single 
core installation because size of index is smaller. Is there any relationship 
between size of index and time for indexing process?

2.   How bigger startup time is for Solr with 30 multiple cores in 
comparison to Single core in case cache warming is disabled? This option is 
really important for us.

3.   What processes are executed during Solr startup?

Thank you in advance, Ivan 



RE: no css on browse UI when multicore

2012-05-21 Thread Aleksander Akerø
Ok, thanks a bunch!

I think the url's are set up properly but we have sort of made our own
solrconfig files so it's probably the file handler then.
I will look into that, but I'm 99.999% sure that this was my problem.

Again, thank you for the quick reply!

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: 21. mai 2012 14:33
To: solr-user@lucene.apache.org
Subject: Re: no css on browse UI when multicore


On May 21, 2012, at 08:11 , Aleksander Akerø wrote:
 The css files from the browse GUI in solr 3.6 does not seem to work 
 properly when solr is deployed with multiple cores and I can’t figure 
 out how to solve this. I know this have been an issue in solr but I 
 thought it was fixed in the newer versions.
 
 
 
 Any answers or pointers on how to get this fixed is much appreciatedJ

Each core has it's own templates, and thus it'll be core dependent.  There
is a conf/velocity/VM_global_library.vm that has the base path that the
other templates can use for a base path, and it should look like this:

  #macro(url_for_solr)/solr#if($request.core.name !=
)/$request.core.name#end#end

And the stylesheet is referenced in head.vm like this:

  link rel=stylesheet type=text/css
href=#{url_for_solr}/admin/file?file=/velocity/main.csscontentType=text/cs
s/

This requires that the file serving handler (/admin/file) be enabled and
that conf/velocity/main.css exist.

Does that help?   If not, what is the HTML rendered from /browse say as the
CSS URL?  What error does hitting that URL directly give?

Erik




Re: Fault tolerant Solr replication architecture

2012-05-21 Thread Jeremy Taylor
Have you looked at DataStax Enterprise?
On May 21, 2012 12:25 AM, Parvin Gasimzade parvin.gasimz...@gmail.com
wrote:

 Hi,

 I am using solr with replication. I have one master that indexes data and
 two slaves which pulls index from master and responds to the queries.

 My question is, how can i create fault tolerant architecture? I mean what
 should i do when master server crashes? I heard that repeater is used for
 this type of architecture. Then, do I have to create one master, one slave
 with repeater and one slave?

 Another question is, if master crashes then does slave with repeater start
 indexing authomatically or should i configure it manually?

 I asked similar question on the stackoverflow :

 http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture

 Any help will be appreciated.

 Regards,
 Parvin



Re: using Carrot2 custom ITokenizerFactory

2012-05-21 Thread Koji Sekiguchi

My problem was gone. Thanks Staszek and Dawid!

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


(12/05/21 18:11), Stanislaw Osinski wrote:

Hi Koji,

Dawid came up with a simple fix for this, it's committed to trunk and 3.6
branch.

Staszek


RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
Hi,

SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes 
the DistributedProcessor in the update chain. 

Thanks,
Markus

 
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Fri 18-May-2012 16:05
 To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io
 Subject: Re: SolrCloud deduplication
 
 Hey Markus -
 
 When I ran into a similar issue with another update proc, I created 
 https://issues.apache.org/jira/browse/SOLR-3215 so that I could order things 
 to avoid this. I have not committed this yet though, in favor of waiting for 
 https://issues.apache.org/jira/browse/SOLR-2822
 
 Go vote? :)
 
 On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:
 
  Hi,
  
  Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is 
  not 
  functional anymore. The problem is that documents are passed multiple times 
  through the URP and the digest field is added as if it is an multi valued 
  field. 
  If the field is not multi valued you'll get this typical error. Changing 
  the 
  order or URP's in the chain does not solve the problem.
  
  Any hints on how to resolve the issue? Is this a problem in the 
  SignatureUpdateRequestProcessor and does it need to be updated to work with 
  SolrCloud? 
  
  Thanks,
  Markus
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 


Re: Duplicate documents being added even with unique key

2012-05-21 Thread Parmeley, Michael
Changing my field type to string for my uniquekey field solved the problem. 
Thanks to Jack and Erik for the fix!

On May 18, 2012, at 5:33 PM, Jack Krupansky wrote:

 Typically the uniqueKey field is a string field type (your schema uses 
 text_general), although I don't think it is supposed to be a requirement. 
 Still, it is one thing that stands out.
 
 Actually, you may be running into some variation of SOLR-1401:
 
 https://issues.apache.org/jira/browse/SOLR-1401
 
 In other words, stick with string and stay away from a tokenized (text) 
 key.
 
 You could also get duplicates by merging cores or if your add has 
 allowDups = true or overwrite=false.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Parmeley, Michael
 Sent: Friday, May 18, 2012 5:50 PM
 To: solr-user@lucene.apache.org
 Subject: Duplicate documents being added even with unique key
 
 I have a uniquekey set in my schema; however, I am still getting duplicated 
 documents added. Can anyone provide any insight into why this may be 
 happening?
 
 This is in my schema.xml:
 
 !-- Field to use to determine and enforce document uniqueness.
  Unless this field is marked with required=false, it will be a 
 required field
   --
 uniqueKeyuniquekey/uniqueKey
 
   field name=uniquekey type=text_general indexed=true stored=true 
 required=true /
 
 On startup I get this message in catalina.out:
 
 INFO: unique key field: uniquekey
 
 However, you can see I get multiple documents:
 
 result name=response numFound=7 start=0
 doc
 str name=abbreviationPSR3/str
 int name=clientid1/int
 str name=entitytypeSkill/str
 int name=id510/int
 str name=nameBody and Soul/str
 int name=projectid1/int
 int name=skillnumber281/int
 str name=uniquekeySkill510/str
 /doc
 doc
 str name=abbreviationPSR3/str
 int name=clientid1/int
 str name=entitytypeSkill/str
 int name=id510/int
 str name=nameBody and Soul/str
 int name=projectid1/int
 int name=skillnumber281/int
 str name=uniquekeySkill510/str
 /doc
 doc
 str name=abbreviationPSR3/str
 int name=clientid1/int
 str name=entitytypeSkill/str
 int name=id510/int
 str name=nameBody and Soul/str
 int name=projectid1/int
 int name=skillnumber281/int
 str name=uniquekeySkill510/str
 /doc
 doc
 str name=abbreviationPSR3/str
 int name=clientid1/int
 str name=entitytypeSkill/str
 int name=id510/int
 str name=nameBody and Soul/str
 int name=projectid1/int
 int name=skillnumber281/int
 str name=uniquekeySkill510/str
 /doc
 doc
 str name=abbreviationPSR3/str
 int name=clientid1/int
 str name=entitytypeSkill/str
 int name=id510/int
 str name=nameBody and Soul/str
 int name=projectid1/int
 int name=skillnumber281/int
 str name=uniquekeySkill510/str
 /doc
 doc
 str name=abbreviationPSR3/str
 int name=clientid1/int
 str name=entitytypeSkill/str
 int name=id510/int
 str name=nameBody and Soul/str
 int name=projectid1/int
 int name=skillnumber281/int
 str name=uniquekeySkill510/str
 /doc
 doc
 str name=abbreviationPSR3/str
 int name=clientid1/int
 str name=entitytypeSkill/str
 int name=id510/int
 str name=nameBody and Soul/str
 int name=projectid1/int
 int name=skillnumber281/int
 str name=uniquekeySkill510/str
 /doc
 /result 
 



RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
Hi again,

It seemed to work fine but in the end duplicates are not overwritten. We first 
run the SignatureProcessor and then the DistributedProcessor. If we do it the 
other way around the digest field receives multiple values and throws errors. 
Is there anything else we can do or another patch to try?

Thanks
Markus
 
 
-Original message-
 From:Markus Jelsma markus.jel...@openindex.io
 Sent: Mon 21-May-2012 15:58
 To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com
 Subject: RE: SolrCloud deduplication
 
 Hi,
 
 SOLR-2822 seems to work just fine as long as the SignatureProcessor precedes 
 the DistributedProcessor in the update chain. 
 
 Thanks,
 Markus
 
  
  
 -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Fri 18-May-2012 16:05
  To: solr-user@lucene.apache.org; Markus Jelsma markus.jel...@openindex.io
  Subject: Re: SolrCloud deduplication
  
  Hey Markus -
  
  When I ran into a similar issue with another update proc, I created 
  https://issues.apache.org/jira/browse/SOLR-3215 so that I could order 
  things to avoid this. I have not committed this yet though, in favor of 
  waiting for https://issues.apache.org/jira/browse/SOLR-2822
  
  Go vote? :)
  
  On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:
  
   Hi,
   
   Deduplication on SolrCloud through the SignatureUpdateRequestProcessor is 
   not 
   functional anymore. The problem is that documents are passed multiple 
   times 
   through the URP and the digest field is added as if it is an multi valued 
   field. 
   If the field is not multi valued you'll get this typical error. Changing 
   the 
   order or URP's in the chain does not solve the problem.
   
   Any hints on how to resolve the issue? Is this a problem in the 
   SignatureUpdateRequestProcessor and does it need to be updated to work 
   with 
   SolrCloud? 
   
   Thanks,
   Markus
  
  - Mark Miller
  lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
 


Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
Hi.

In debug mode, the generated query was:

str name=rawquerystringfield:*2231-7/str
str name=querystringfield:*2231-7/str
str name=parsedqueryfield:*2231-7/str
str name=parsedquery_toStringfield:*2231-7/str

The analisys of indexing the  text  .2231-7 produces this result:
Index Analyzer  .22317  .22317  .22317  .22317  #1;1322.
#1;7 .22317
And for search for *2231-7 , produces this result:
Query Analyzer  22317  22317  22317  22317 22317

I don't understand why he don't find results when i use field:*2231-7.
When i use field:*2231 without -7 the document was found.

How Ahmet said, i think they using -7 to ignore the document. But in
debug query, they don't show this.

Any idea to solve this?

Thanks


2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on




Re: Question about wildcards

2012-05-21 Thread Jack Krupansky
Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the 
presence of a wildcard completely short-circuited (prevented) the query-time 
analysis, so you have to manually emulate all steps of the query analyzer 
yourself if you want to do a wildcard. Even with 3.6, not all filters are 
multi-term aware.


See:
http://wiki.apache.org/solr/MultitermQueryAnalysis

Do a query for .2231-7 and that will tell you which analyzer steps you 
will have to do manually.


-- Jack Krupansky

-Original Message- 
From: Anderson vasconcelos

Sent: Monday, May 21, 2012 11:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about wildcards

Hi.

In debug mode, the generated query was:

str name=rawquerystringfield:*2231-7/str
str name=querystringfield:*2231-7/str
str name=parsedqueryfield:*2231-7/str
str name=parsedquery_toStringfield:*2231-7/str

The analisys of indexing the  text  .2231-7 produces this result:
Index Analyzer  .22317  .22317  .22317  .22317  #1;1322.
#1;7 .22317
And for search for *2231-7 , produces this result:
Query Analyzer  22317  22317  22317  22317 22317

I don't understand why he don't find results when i use field:*2231-7.
When i use field:*2231 without -7 the document was found.

How Ahmet said, i think they using -7 to ignore the document. But in
debug query, they don't show this.

Any idea to solve this?

Thanks


2012/5/18 Ahmet Arslan iori...@yahoo.com




 I have a field that was indexed with the string
 .2231-7. When i
 search using '*' or '?' like this *2231-7 the query
 don't returns
 results. When i remove -7 substring and search agin using
 *2231 the
 query returns. Finally when i search using
 .2231-7 the query returns
 too.

May be standard tokenizer is splitting .2231-7 into multiple tokens?
You can check that admin/analysis page.

May be -7 is treated as negative clause? You can check that with
debugQuery=on






Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
I change the fieldtype of field to  the follow:

fieldType name=text_ws class=solr.TextField positionIncrementGap=100
analyzertokenizer
class=solr.WhitespaceTokenizerFactory//analyzer
/fieldType

As you see, i just keep the WhitespaceTokenizerFactory. That's works. Now i
could find using *2231?7, *2231*7, *2231-7,
*2231*,.2231-7.

How i can see, with this tokenizer the text was not spplitted. Is that the
best way to solve this?

Thanks



2012/5/21 Anderson vasconcelos anderson.v...@gmail.com

 Hi.

 In debug mode, the generated query was:

 str name=rawquerystringfield:*2231-7/str
 str name=querystringfield:*2231-7/str
 str name=parsedqueryfield:*2231-7/str
 str name=parsedquery_toStringfield:*2231-7/str

 The analisys of indexing the  text  .2231-7 produces this result:
 Index Analyzer  .22317  .22317  .22317  .22317
 #1;1322.#1;7 .22317
 And for search for *2231-7 , produces this result:
 Query Analyzer  22317  22317  22317  22317 22317

 I don't understand why he don't find results when i use field:*2231-7.
 When i use field:*2231 without -7 the document was found.

 How Ahmet said, i think they using -7 to ignore the document. But in
 debug query, they don't show this.

 Any idea to solve this?

 Thanks


 2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on





Re: Question about wildcards

2012-05-21 Thread Jack Krupansky
And, generally when I see a field that has values like .2231-7, it 
should be a string field rather than tokenized text. As a string, you can 
then do straight wildcards without surprises.



-- Jack Krupansky
-Original Message- 
From: Jack Krupansky

Sent: Monday, May 21, 2012 11:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about wildcards

Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the
presence of a wildcard completely short-circuited (prevented) the query-time
analysis, so you have to manually emulate all steps of the query analyzer
yourself if you want to do a wildcard. Even with 3.6, not all filters are
multi-term aware.

See:
http://wiki.apache.org/solr/MultitermQueryAnalysis

Do a query for .2231-7 and that will tell you which analyzer steps you
will have to do manually.

-- Jack Krupansky

-Original Message- 
From: Anderson vasconcelos

Sent: Monday, May 21, 2012 11:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about wildcards

Hi.

In debug mode, the generated query was:

str name=rawquerystringfield:*2231-7/str
str name=querystringfield:*2231-7/str
str name=parsedqueryfield:*2231-7/str
str name=parsedquery_toStringfield:*2231-7/str

The analisys of indexing the  text  .2231-7 produces this result:
Index Analyzer  .22317  .22317  .22317  .22317  #1;1322.
#1;7 .22317
And for search for *2231-7 , produces this result:
Query Analyzer  22317  22317  22317  22317 22317

I don't understand why he don't find results when i use field:*2231-7.
When i use field:*2231 without -7 the document was found.

How Ahmet said, i think they using -7 to ignore the document. But in
debug query, they don't show this.

Any idea to solve this?

Thanks


2012/5/18 Ahmet Arslan iori...@yahoo.com




 I have a field that was indexed with the string
 .2231-7. When i
 search using '*' or '?' like this *2231-7 the query
 don't returns
 results. When i remove -7 substring and search agin using
 *2231 the
 query returns. Finally when i search using
 .2231-7 the query returns
 too.

May be standard tokenizer is splitting .2231-7 into multiple tokens?
You can check that admin/analysis page.

May be -7 is treated as negative clause? You can check that with
debugQuery=on




Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
Thanks all for the explanations.

Anderson

2012/5/21 Jack Krupansky j...@basetechnology.com

 And, generally when I see a field that has values like .2231-7, it
 should be a string field rather than tokenized text. As a string, you can
 then do straight wildcards without surprises.


 -- Jack Krupansky
 -Original Message- From: Jack Krupansky
 Sent: Monday, May 21, 2012 11:23 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Question about wildcards

 Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the
 presence of a wildcard completely short-circuited (prevented) the
 query-time
 analysis, so you have to manually emulate all steps of the query analyzer
 yourself if you want to do a wildcard. Even with 3.6, not all filters are
 multi-term aware.

 See:
 http://wiki.apache.org/solr/**MultitermQueryAnalysishttp://wiki.apache.org/solr/MultitermQueryAnalysis

 Do a query for .2231-7 and that will tell you which analyzer steps
 you
 will have to do manually.

 -- Jack Krupansky

 -Original Message- From: Anderson vasconcelos
 Sent: Monday, May 21, 2012 11:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about wildcards

 Hi.

 In debug mode, the generated query was:

 str name=rawquerystringfield:***2231-7/str
 str name=querystringfield:***2231-7/str
 str name=parsedqueryfield:***2231-7/str
 str name=parsedquery_toString**field:*2231-7/str

 The analisys of indexing the  text  .2231-7 produces this result:
 Index Analyzer  .22317  .22317  .22317  .22317
  #1;1322.
 #1;7 .22317
 And for search for *2231-7 , produces this result:
 Query Analyzer  22317  22317  22317  22317 22317

 I don't understand why he don't find results when i use field:*2231-7.
 When i use field:*2231 without -7 the document was found.

 How Ahmet said, i think they using -7 to ignore the document. But in
 debug query, they don't show this.

 Any idea to solve this?

 Thanks


 2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on





RE: SolrCloud deduplication

2012-05-21 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-3473

-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Mon 21-May-2012 18:11
 To: solr-user@lucene.apache.org
 Subject: Re: SolrCloud deduplication
 
 Looking again at the SignatureUpdateProcessor code, I think that indeed this 
 won't currently work with distrib updates. Could you file a JIRA issue for 
 that? The problem is that we convert update commands into solr documents - 
 and that can cause a loss of info if an update proc modifies the update 
 command.
 
 I think the reason that you see a multiple values error when you try the 
 other order is because of the lack of a document clone (the other issue I 
 mentioned a few emails back). Addressing that won't solve your issue though - 
 we have to come up with a way to propagate the currently lost info on the 
 update command.
 
 - Mark
 
 On May 21, 2012, at 10:39 AM, Markus Jelsma wrote:
 
  Hi again,
  
  It seemed to work fine but in the end duplicates are not overwritten. We 
  first run the SignatureProcessor and then the DistributedProcessor. If we 
  do it the other way around the digest field receives multiple values and 
  throws errors. Is there anything else we can do or another patch to try?
  
  Thanks
  Markus
  
  
  -Original message-
  From:Markus Jelsma markus.jel...@openindex.io
  Sent: Mon 21-May-2012 15:58
  To: solr-user@lucene.apache.org; Mark Miller markrmil...@gmail.com
  Subject: RE: SolrCloud deduplication
  
  Hi,
  
  SOLR-2822 seems to work just fine as long as the SignatureProcessor 
  precedes the DistributedProcessor in the update chain. 
  
  Thanks,
  Markus
  
  
  
  -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Fri 18-May-2012 16:05
  To: solr-user@lucene.apache.org; Markus Jelsma 
  markus.jel...@openindex.io
  Subject: Re: SolrCloud deduplication
  
  Hey Markus -
  
  When I ran into a similar issue with another update proc, I created 
  https://issues.apache.org/jira/browse/SOLR-3215 so that I could order 
  things to avoid this. I have not committed this yet though, in favor of 
  waiting for https://issues.apache.org/jira/browse/SOLR-2822
  
  Go vote? :)
  
  On May 18, 2012, at 7:49 AM, Markus Jelsma wrote:
  
  Hi,
  
  Deduplication on SolrCloud through the SignatureUpdateRequestProcessor 
  is not 
  functional anymore. The problem is that documents are passed multiple 
  times 
  through the URP and the digest field is added as if it is an multi 
  valued field. 
  If the field is not multi valued you'll get this typical error. Changing 
  the 
  order or URP's in the chain does not solve the problem.
  
  Any hints on how to resolve the issue? Is this a problem in the 
  SignatureUpdateRequestProcessor and does it need to be updated to work 
  with 
  SolrCloud? 
  
  Thanks,
  Markus
  
  - Mark Miller
  lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
  
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 


Re: boost function parameter (bf) ignores character escaping

2012-05-21 Thread Jack Krupansky
I think there is a way in sunspot to give an explicit name to a field so 
that sunspot doesn't generate class-namecolonfield-name for field 
names. I think it is the :as function, such as:


string :name, :as = :name_s

So, you can then refer to name in your ruby code and name_s will be the 
field name in Solr.


-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Monday, May 21, 2012 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: boost function parameter (bf) ignores character escaping

Quoting from the new trunk example schema:

field names should consist of alphanumeric or underscore characters only
and
 not start with a digit.  This is not currently strictly enforced,
 but other field names will not have first class support from all
components
 and back compatibility is not guaranteed.

In other words, don't do it. Replace the colon with an underscore in your
field names.

-- Jack Krupansky

-Original Message- 
From: m...@nils-kaiser.de

Sent: Monday, May 21, 2012 8:01 AM
To: solr-user@lucene.apache.org
Subject: boost function parameter (bf) ignores character escaping

Hey,

I'm running solr (3.5.0.2011.11.30.16.37.06) and have encountered what
I think is a bug with the boost function (bf) parameter.

I've used sunspot (for use of solr with rails) which allows managing
dynamic fields, which by default creates fields like
dynamicfield:value1,dynamicfield:value2, though using the :
character in the field name, which needs to be escaped.

If I use a query which includes q=dynamicfield\:value1:6, everything
works fines and matches are found.

However, if I use the bf field with bf=dynamicfield\:value1, I get an
error message undefined field dynamicfield, the same without
escaping the :

Should I file a bug report?

Best,

Nils 



RE: trunk cloud ui not working

2012-05-21 Thread Phil Hoy
After further investigation I have found that it is not a problem on firefox, 
only chrome and IE. 

Phil

-Original Message-
Sent: 21 May 2012 18:05
To: solr-user@lucene.apache.org
Subject: trunk cloud ui not working

Hi,

I am running from the trunk and the localhost:8983/solr/#/~cloud page shows 
nothing but Fetch Zookeeper Data.

If I run fiddler I see that:
http://localhost:8983/solr/zookeeper?wt=jsondetail=truepath=%2Fclusterstate.json
and
http://localhost:8983/solr/zookeeper?wt=jsonpath=%2Flive_nodes
are called and return data but no update to the ui.

Cheers,
Phil


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


Re: trunk cloud ui not working

2012-05-21 Thread Mark Miller
What OS? I was just trying trunk and looking at that view on Chrome on OSX and 
Linux and did not see an issue.

On May 21, 2012, at 1:15 PM, Phil Hoy wrote:

 After further investigation I have found that it is not a problem on firefox, 
 only chrome and IE. 
 
 Phil
 
 -Original Message-
 Sent: 21 May 2012 18:05
 To: solr-user@lucene.apache.org
 Subject: trunk cloud ui not working
 
 Hi,
 
 I am running from the trunk and the localhost:8983/solr/#/~cloud page shows 
 nothing but Fetch Zookeeper Data.
 
 If I run fiddler I see that:
 http://localhost:8983/solr/zookeeper?wt=jsondetail=truepath=%2Fclusterstate.json
 and
 http://localhost:8983/solr/zookeeper?wt=jsonpath=%2Flive_nodes
 are called and return data but no update to the ui.
 
 Cheers,
 Phil
 
 
 __
 This email has been scanned by the brightsolid Email Security System. Powered 
 by MessageLabs 
 __

- Mark Miller
lucidimagination.com













Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread 12rad
The field I am trying to highlight is stored. 


field name=text type=text_en required=false  compressed=false
omitNorms=false 
indexed=true stored=true multiValued=true termVectors=true
termPositions=true
termOffsets=true/


In the searchHandler i've set the parameters as follows: 

   str name=hlon/str
   str name=hl.fltext/str
   str name =hl.snippets5/str
   str name=hl.fragsize1000/str
   str name=hl.maxAnalyzedChars51/str
   str name=hl.requireFieldMatchtrue/str
   str name=hl.fragmenterregex/str
   str name =hl.fragListBuildersimple/str
   str name =hl.fragmentsBuildercolored/str
   str name=hl.phraseLimit1000/str
   str name=hl.usePhraseHighlightertrue/str
   str name=hl.highlightMultiTermtrue/str
   str name =hl.useFastVectorHighlighertrue/str


I still don't see any highlighting. I've managed to get snippets of text but
the actual word is not highlighted. I don't know where I am going wrong?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985174.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fault tolerant Solr replication architecture

2012-05-21 Thread Jan Høydahl
Parvin,

What you are looking for is already available in the bleeding edge, 
unreleased version of Solr, which will become version 4.0 sometime later this 
year. You can download it at [1] and test it out. The feature is called 
SolrCloud [2] and it replaces the old replication mechanism in 1.x and 3.x 
versions. Instead of slaves pulling the whole index from masters, the masters 
will forward individual updates to the slaves. Note that this feature is still 
under development and certain things will change before 4.0 release, but it is 
pretty stable and even in use in production some places.

[1] 
https://builds.apache.org/job/Solr-trunk/lastSuccessfulBuild/artifact/artifacts/
[2] http://wiki.apache.org/solr/SolrCloud

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 21. mai 2012, at 09:25, Parvin Gasimzade wrote:

 Hi,
 
 I am using solr with replication. I have one master that indexes data and
 two slaves which pulls index from master and responds to the queries.
 
 My question is, how can i create fault tolerant architecture? I mean what
 should i do when master server crashes? I heard that repeater is used for
 this type of architecture. Then, do I have to create one master, one slave
 with repeater and one slave?
 
 Another question is, if master crashes then does slave with repeater start
 indexing authomatically or should i configure it manually?
 
 I asked similar question on the stackoverflow :
 http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture
 
 Any help will be appreciated.
 
 Regards,
 Parvin



Re: CloudSolrServer not working with standalone Zookeeper

2012-05-21 Thread Daniel Brügge
Ok, it seems that a maven dependency to zookeeper version 3.3 broke this.
Now it connects to the zk instance.

Thanks.

On Mon, May 21, 2012 at 5:31 PM, Daniel Brügge 
daniel.brue...@googlemail.com wrote:

 Thanks for your feedback. I don't know.

 I've tried just now with the newest trunk version and the embedded ZK on
 port 9983.

 In the logs of the zk-solr it shows:

 *INFO: Accepted socket connection from /XXX.XXX.XXX.XXX:1055*
 *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn doIO*
 *WARNING: EndOfStreamException: Unable to read additional data from
 client sessionid 0x0, likely client has closed socket*
 *May 21, 2012 3:27:34 PM org.apache.zookeeper.server.NIOServerCnxn
 closeSock*
 *INFO: Closed socket connection for client /XXX.XXX.XXX.XXX:1055 (no
 session established for client)*


 So it can definitely connects to the port in my opinion, but it closes the
 connection after the defined timeout (here 1ms)

 *Caused by: java.util.concurrent.TimeoutException: Could not connect to
 ZooKeeper MYZKHOST.:9983 within 1 m*

 Hmm. I also thought that this trivial setup should work. Will check again.

 Daniel

 On Fri, May 18, 2012 at 4:23 PM, Mark Miller markrmil...@gmail.comwrote:

 Seems something is stopping the connection from occurring? Tests are
 constantly running and doing this using an embedded zk server - and I know
 more than a few people using an external zk setup. I'd have to guess
 something in your env or URL is causing this?


 On May 16, 2012, at 3:11 PM, Daniel Brügge wrote:

  OK, it's also not working with an internal started Zookeeper.
 
  On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge 
  daniel.brue...@googlemail.com wrote:
 
  Hi,
 
  I am just playing around with SolrCloud and have read in articles like
 
 
 http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/thatit
  is sufficient to create the connection to the Zookeeper instance and
 not
  to the Solr instance.
  When I try to connect to my standalone  Zookeeper instance (not started
  with a Solr instance and -DzkRun) I am getting this error:
 
  Caused by: java.util.concurrent.TimeoutException: Could not connect to
  ZooKeeper
 
 
  I am also getting this error when I try to connect directly to one of
 the
  Solr instances.
 
  My code looks like this:
 
 solr = new CloudSolrServer(myzkhost:2181);
 ((CloudSolrServer)
 solr).setDefaultCollection(collection1);
 
  I am working with the latest Solr trunk version (
  https://builds.apache.org/view/S-Z/view/Solr/job/Solr-trunk/1855/)
 
  Do I need to start the zookeeper in Solr to keep this working?
 
  Thanks  regards
 
  Daniel
 

 - Mark Miller
 lucidimagination.com















Re: Lucene FieldCache - Out of memory exception

2012-05-21 Thread Chris Hostetter

: I am using solr 1.3 with jdk 1.5.0_14 and weblogic 10MP1 application server
: on Solaris. I use embedded solr server. More details :

FWIW: Solr 1.3 is *REALLY* old ... do not be suprised if much of the info 
you are given (or read) doesn't apply.

: - some mail threads on this forum seem to indicate that there could be some
: connection between having dynamic fields and usage of FieldCache. Is this
: true ? Most of the fields in my index are dynamic fields.

there is no specific corrolation between dynamic fields and the field 
cache -- what you may be seeing is people commenting about dangers of 
*using* field caches with dynamic fields, because typically when people 
use dynamic fields there is no fixed number of pre-defined fields in use 
(that's the whole perk of dynamic fields) so if you are using hundreds or 
thousands of dynamic field in a way that involves the field cache, you 
might have problems (because field cache objects tend to be large)

: - as mentioned above, most of my faceted queries could have around 50-70
: facet fields (I would do SolrQuery.addFacetField() for around 50-70 fields
: per query). Could this be the source of the problem ? Is this too high for
: solr to support ?

In Solr 1.3, faceting does not use the field cache AT ALL!

starting with Solr 1.4, facting can use the field cache (or a similar 
concept called UnInvertedFields when multivalued).  You can force 
Solr1.4+ not to use the fielld cache for this by specifying 
facet.method=enum

https://wiki.apache.org/solr/SimpleFacetParameters#facet.method

: - Initially, I had a facet.sort defined in solrconfig.xml. Since FieldCache
: builds up on sorting, I even removed the facet.sort and tried, but no
: respite. The behavior is same as before.

Facet sorting is not the same as result sorting. facet sorting does not 
use the field cache at all.

nothing you've mentioned in your initial email, or the example query you 
posted should involve the field cache in anyway (in Solr 1.3!) so if you 
are seeing your heap eaten up by field cache objects there is more going 
on in your system then you know about (or that you've told us) ... you 
need to look at the fields assocaited with those field caches, and then 
see how you are using those fields in requests, to make sense of what they 
exist. in your heap.



-Hoss


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Rahul Warawdekar
Hi,

Can you please provide the definitions of the following 3 objects from your
solrconfig.xml ?

str name =hl.fragListBuildersimple/str
str name =hl.fragmentsBuildercolored/str
str name=hl.fragmenterregex/str


For eg,
the simple hl.fragListBuilder should be defined as mentioned below in
your solrconfig.xml
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/


On Mon, May 21, 2012 at 2:06 PM, 12rad prama.an...@gmail.com wrote:

 The field I am trying to highlight is stored.


 field name=text type=text_en required=false  compressed=false
 omitNorms=false
indexed=true stored=true multiValued=true termVectors=true
 termPositions=true
termOffsets=true/


 In the searchHandler i've set the parameters as follows:

   str name=hlon/str
   str name=hl.fltext/str
   str name =hl.snippets5/str
   str name=hl.fragsize1000/str
   str name=hl.maxAnalyzedChars51/str
   str name=hl.requireFieldMatchtrue/str
   str name=hl.fragmenterregex/str
   str name =hl.fragListBuildersimple/str
   str name =hl.fragmentsBuildercolored/str
   str name=hl.phraseLimit1000/str
   str name=hl.usePhraseHighlightertrue/str
   str name=hl.highlightMultiTermtrue/str
   str name =hl.useFastVectorHighlighertrue/str


 I still don't see any highlighting. I've managed to get snippets of text
 but
 the actual word is not highlighted. I don't know where I am going wrong?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985174.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


SolrJ: clusters, labels, docs - search results

2012-05-21 Thread okayndc
Hello,

Was wondering how to access the cluster labels, and docs(ids) via SolrJ?

I have added the following:
   query.seParam(q, userQuery);
   query.setParam(clustering, true);
   query.setParam(qt, /core2/clustering);
   query.setParam(carrot.title, title);

But how to access the labels, docs in the clusters and display in a search
result?

Also, I've seen others specify clustering in this manner...

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(qt, /core2/clustering);
params.set(q, userQuery);
params.set(carrot.title, title);
params.set(clustering, true);


Is this preferred over the other?

Thanks


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread 12rad
For the fragListBuilder
 it's 
fragListBuilder name=simple 
   default=true
   class=solr.highlight.SimpleFragListBuilder/

fragment builder is 
fragmentsBuilder name=colored 
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre/str
  str name=hl.tag.post/str
/lst
  /fragmentsBuilder


 fragmenter name=regex 
  class=solr.highlight.RegexFragmenter
lst name=defaults
  
  int name=hl.fragsize70/int
  
  float name=hl.regex.slop0.5/float
  
  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter


Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985212.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Facets and doc count for a term

2012-05-21 Thread Chris Hostetter

: Is there a way to not only get the number of times a term appears for
: a particular field (faceting) as well as the number of documents that
: were associated with a particular term?  So for instance if I had the
: following docs

Nope... faceting is associated with _sets_ of documents, so there is no 
scoring info associated with each constraint, just the number of documents 
in the set (that's what allows it to be very efficient)



-Hoss


Re: Not able to use the highlighting feature! Want to return snippets of text

2012-05-21 Thread Rahul Warawdekar
Hi,

I believe, in your colored fragmentsBuilder definition, you have not
mentioned anything in your pre and post tags and that may be the reason
that you are getting snippets of text, without highlighting.
Please refer http://wiki.apache.org/solr/HighlightingParameters and check
the hl.fragmentsBuilder section.
Try specifying the pre and post tags with information as mentioned below.
(same as wiki link above)

!-- multi-colored tag FragmentsBuilder --
fragmentsBuilder name=colored
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
  lst name=defaults
str name=hl.tag.pre![CDATA[
 b style=background:yellow,b style=background:lawgreen,
 b style=background:aquamarine,b style=background:magenta,
 b style=background:palegreen,b style=background:coral,
 b style=background:wheat,b style=background:khaki,
 b style=background:lime,b
style=background:deepskyblue]]/str
str name=hl.tag.post![CDATA[/b]]/str
  /lst
/fragmentsBuilder


On Mon, May 21, 2012 at 3:52 PM, 12rad prama.an...@gmail.com wrote:

 For the fragListBuilder
  it's
 fragListBuilder name=simple
   default=true
   class=solr.highlight.SimpleFragListBuilder/

 fragment builder is
 fragmentsBuilder name=colored
class=solr.highlight.ScoreOrderFragmentsBuilder
lst name=defaults
  str name=hl.tag.pre/str
  str name=hl.tag.post/str
/lst
  /fragmentsBuilder


  fragmenter name=regex
  class=solr.highlight.RegexFragmenter
lst name=defaults

  int name=hl.fragsize70/int

  float name=hl.regex.slop0.5/float

  str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str
/lst
  /fragmenter


 Thanks!

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Not-able-to-use-the-highlighting-feature-Want-to-return-snippets-of-text-Urgent-tp3985012p3985212.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks and Regards
Rahul A. Warawdekar


how to join 3 tables to pull required data

2012-05-21 Thread srini
I am having a situation where I need to join 3 tables to pull the required
information. Can anyone throw me some ideas!!!

select A.sid, B.cid, C.NAME
from table1 A, table2 B, table3 C
where A.sid= C.sid
and 
A.oid = B.oid
and C.typeid = 5
and C.flag = 0
and B.cid= 1000;

Can you please provide schema file above requirement?








--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-join-3-tables-to-pull-required-data-tp3985218.html
Sent from the Solr - User mailing list archive at Nabble.com.


Remote streaming - posting a URL which is password protected

2012-05-21 Thread 12rad
I want to post index a http document that is password protected. 
It has a username name login. 
I tried doing this 

curl -u username:password
http://localhost:8983/solr/update/extract?literal.id=doc900commit=true; -F
stream.url=http://somewebsite.com/docs/DOC2609

but it just indexes the login page only.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Remote-streaming-posting-a-URL-which-is-password-protected-tp3985221.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr mail dataimporter cannot be found

2012-05-21 Thread Emma Bo Liu
Hi,

I want to index emails using solr. I put the user name, password, hostname
in data-config.xml under mail folder. This is a valid email but when I run
in url http://localhost:8983/solr/mail/dataimport?command=full-import  It
said cannot access mail/dataimporter reason: no found.  But when i run
http://localhost:8983/solr/rss/dataimport?command=full-importhttp://localhost:8983/solr/mail/dataimport?command=full-import
 or  
http://localhost:8983/solr/db/dataimport?command=full-imporhttp://localhost:8983/solr/mail/dataimport?command=full-import
They can be found.

In addition, when I run the command java
-Dsolr.solr.home=./example-DIH/solr/ -jar start.jar , on the left side of
solr UI, there are db, rss, tika and solr but no mail. Is it a bug that
mail indexing? Thank you so much!

Best,

Emma


Re: Remote streaming - posting a URL which is password protected

2012-05-21 Thread Jan Høydahl
Hi,

Using curl -u will only attempt to log in to Jetty/Solr, which is not password 
protected I assume. What you really would like is for the HTTP call which Solr 
does based on stream.url to attempt a login. Such functionality is not 
implemented as far as I know. You may try the syntax 
stream.url=http://username:passw...@somewebsite.com/docs/DOC2609 but I have not 
tested it. Why can't you download the file locally first? If you're looking for 
a production grade HTTP crawler you could look at ManifoldCF.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 21. mai 2012, at 22:44, 12rad wrote:

 I want to post index a http document that is password protected. 
 It has a username name login. 
 I tried doing this 
 
 curl -u username:password
 http://localhost:8983/solr/update/extract?literal.id=doc900commit=true; -F
 stream.url=http://somewebsite.com/docs/DOC2609
 
 but it just indexes the login page only.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Remote-streaming-posting-a-URL-which-is-password-protected-tp3985221.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: UI

2012-05-21 Thread Johannes Goll
yes, I am using this library and it works perfectly so far. If
something does not work you can just modify it
http://code.google.com/p/solr-php-client/

Johannes
2012/5/21 Tolga to...@ozses.net:
 Hi,

 Can you recommend a good PHP UI to search? Is SolrPHPClient good?


Re: Newbie with Carrot2?

2012-05-21 Thread Chris Hostetter

: Subject: Newbie with Carrot2?
: References: 35E48F3294A0416A8F476E9C173321F3@msrvcn04
: In-Reply-To: 35E48F3294A0416A8F476E9C173321F3@msrvcn04

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Re: Date format in the schema.xml

2012-05-21 Thread Chris Hostetter

: Subject: Date format in the schema.xml
: References: 1336981696.60953.yahoomailclas...@web121705.mail.ne1.yahoo.com
: In-Reply-To: 1336981696.60953.yahoomailclas...@web121705.mail.ne1.yahoo.com

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Re: UI

2012-05-21 Thread Damien Camilleri
My favourite php library is solarium. Everything OOP. I've tried a few.

http://www.solarium-project.org/


Sent from my iPhone

On 21/05/2012, at 6:44 PM, Johannes Goll johannes.g...@gmail.com wrote:

 yes, I am using this library and it works perfectly so far. If
 something does not work you can just modify it
 http://code.google.com/p/solr-php-client/
 
 Johannes
 2012/5/21 Tolga to...@ozses.net:
 Hi,
 
 Can you recommend a good PHP UI to search? Is SolrPHPClient good?


Re: Solr 3.6.0 problem with multi-core and json

2012-05-21 Thread Chris Hostetter

: I should clarify the error a bit. When I make a select request on my first
: core (called core0) using the wt=json parameter I get a 400 response with
: the explanation undefined field: gid. The field gid is not defined in the
: schema.xml file of my first core. But, it is defined in the schema.xml file
: of my third core (core2). Hopefully, this is a slightly better explanation
: of the problem.

What is the full stack trace of the error? (even if your client doens't 
get it, it should be in the log)

Are you sure there is no refrence to gid in your core0 solrconfig.xml?


-Hoss


SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-21 Thread Yandong Yao
Hi Guys,

I use following command to start solr cloud according to solr cloud wiki.

yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar
start.jar

Then I have created several cores using CoreAdmin API (
http://localhost:8983/solr/admin/cores?action=CREATEname=
coreNamecollection=collection1), and clusterstate.json show following
topology:


collection1:
-- shard1:
  -- collection1
  -- CoreForCustomer1
  -- CoreForCustomer3
  -- CoreForCustomer5
-- shard2:
  -- collection1
  -- CoreForCustomer2
  -- CoreForCustomer4


1) Index:

Using following command to index mem.xml file in exampledocs directory.

yydzero:exampledocs bjcoe$ java -Durl=
http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
SimplePostTool: version 1.4
SimplePostTool: POSTing files to
http://localhost:8983/solr/coreForCustomer3/update..
SimplePostTool: POSTing file mem.xml
SimplePostTool: COMMITting Solr index changes.

And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
core has 0 documents.

*Question 1:*  Is this expected behavior? How do I to index documents into
a specific core?

*Question 2*:  If SolrCloud don't support this yet, how could I extend it
to support this feature (index document to particular core), where should i
start, the hashing algorithm?

*Question 3*:  Why the documents are also indexed into 'coreForCustomer1'
and 'coreForCustomer5'?  The default replica for documents are 1, right?

Then I try to index some document to 'coreForCustomer2':

$ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
post.jar ipod_video.xml

While 'coreForCustomer2' still have 0 documents and documents in ipod_video
are indexed to core for customer 1/3/5.

*Question 4*:  Why this happens?

2) Search: I use 
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to
search against 'CoreForCustomer2', while it will return all documents in
the whole collection even though this core has no documents at all.

Then I use 
http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2;,
and it will return 0 documents.

*Question 5*: So If want to search against a particular core, we need to
use 'shards' parameter and use solrCore name as parameter value, right?


Thanks very much in advance!

Regards,
Yandong


Date boosting mlt results - possible?

2012-05-21 Thread John Pettitt

Specifically if I'm doing a query using the solr mlt handler 
(http://wiki.apache.org/solr/MoreLikeThisHandler) and stream.body to supply the 
source doc is there any way to boost result documents based on document age?

I already know how to do that for a regular query using dismax 
(http://wiki.apache.org/solr/FunctionQuery#Date_Boosting) but I can't quite 
figure out the magic incantation to do it for the mlt handler.

John Pettitt   
Email: j...@p.tt







Re: And results before Or results

2012-05-21 Thread Chris Hostetter

: I want to have a strick enforcement that In case of a 3 word search, those
: results that match all 3 term should be presented ahead of those that match
: 2 terms when I set mm=2.
: 
: I have seen quite some cases where, those results that match 2 out of 3
: words appear ahead of those matching all 3 words.

which can happen because of tf/idf and length normalization.

if you disable all of those things for hte fields you 
search on (omitNorms=true omitTf=true) you should see a strict ordering 
based on the number of matching clauses.


-Hoss


Re: UI

2012-05-21 Thread Bill Bell
The php.net plugin is the best. SolrPHPClient is missing several features.

Sent from my Mobile device
720-256-8076

On May 21, 2012, at 6:35 AM, Tolga to...@ozses.net wrote:

 Hi,
 
 Can you recommend a good PHP UI to search? Is SolrPHPClient good?


Re: SolrCloud: how to index documents into a specific core and how to search against that core?

2012-05-21 Thread Darren Govoni
Why do you want to control what gets indexed into a core and then
knowing what core to search? That's the kind of knowing that SolrCloud
solves. In SolrCloud, it handles the distribution of documents across
shards and retrieves them regardless of which node is searched from.
That is the point of cloud, you don't know the details of where
exactly documents are being managed (i.e. they are cloudy). It can
change and re-balance from time to time. SolrCloud performs the
distributed search for you, therefore when you try to search a node/core
with no documents, all the results from the cloud are retrieved
regardless. This is considered A Good Thing.

It requires a change in thinking about indexing and searching

On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote:
 Hi Guys,
 
 I use following command to start solr cloud according to solr cloud wiki.
 
 yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf
 -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
 yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 -jar
 start.jar
 
 Then I have created several cores using CoreAdmin API (
 http://localhost:8983/solr/admin/cores?action=CREATEname=
 coreNamecollection=collection1), and clusterstate.json show following
 topology:
 
 
 collection1:
 -- shard1:
   -- collection1
   -- CoreForCustomer1
   -- CoreForCustomer3
   -- CoreForCustomer5
 -- shard2:
   -- collection1
   -- CoreForCustomer2
   -- CoreForCustomer4
 
 
 1) Index:
 
 Using following command to index mem.xml file in exampledocs directory.
 
 yydzero:exampledocs bjcoe$ java -Durl=
 http://localhost:8983/solr/coreForCustomer3/update -jar post.jar mem.xml
 SimplePostTool: version 1.4
 SimplePostTool: POSTing files to
 http://localhost:8983/solr/coreForCustomer3/update..
 SimplePostTool: POSTing file mem.xml
 SimplePostTool: COMMITting Solr index changes.
 
 And now SolrAdmin UI shows that 'coreForCustomer1', 'coreForCustomer3',
 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and other 2
 core has 0 documents.
 
 *Question 1:*  Is this expected behavior? How do I to index documents into
 a specific core?
 
 *Question 2*:  If SolrCloud don't support this yet, how could I extend it
 to support this feature (index document to particular core), where should i
 start, the hashing algorithm?
 
 *Question 3*:  Why the documents are also indexed into 'coreForCustomer1'
 and 'coreForCustomer5'?  The default replica for documents are 1, right?
 
 Then I try to index some document to 'coreForCustomer2':
 
 $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar
 post.jar ipod_video.xml
 
 While 'coreForCustomer2' still have 0 documents and documents in ipod_video
 are indexed to core for customer 1/3/5.
 
 *Question 4*:  Why this happens?
 
 2) Search: I use 
 http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xml; to
 search against 'CoreForCustomer2', while it will return all documents in
 the whole collection even though this core has no documents at all.
 
 Then I use 
 http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*wt=xmlshards=localhost:8983/solr/coreForCustomer2;,
 and it will return 0 documents.
 
 *Question 5*: So If want to search against a particular core, we need to
 use 'shards' parameter and use solrCore name as parameter value, right?
 
 
 Thanks very much in advance!
 
 Regards,
 Yandong




Re: adding an OR to a fq makes some doc that matched not match anymore

2012-05-21 Thread Chris Hostetter

: - /suggest?q=suggest_terms:lap*fq=type:Pfq=(-type:B)
: numFound=1

: doc, so adding a doc will also fulfill right?
: /suggest?q=suggest_terms:lap*fq=type:Pfq=(-type:B OR name:aa)
: numFound=0
: 
: is there a logical explanation??

http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/


-Hoss


Re: And results before Or results

2012-05-21 Thread Karthick Duraisamy Soundararaj
Interesting, omitTf=true eventhough it would give strict enforcement,
wouldnt it affect the relevancy? Like, I am wondering if the ordering
amongst the three word matches would be not as good as it would be when we
have omitNorms=trueomitTf=true. Do you have an idea?

On Mon, May 21, 2012 at 8:51 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I want to have a strick enforcement that In case of a 3 word search,
 those
 : results that match all 3 term should be presented ahead of those that
 match
 : 2 terms when I set mm=2.
 :
 : I have seen quite some cases where, those results that match 2 out of 3
 : words appear ahead of those matching all 3 words.

 which can happen because of tf/idf and length normalization.

 if you disable all of those things for hte fields you
 search on (omitNorms=true omitTf=true) you should see a strict ordering
 based on the number of matching clauses.


 -Hoss




-- 
--
Karthick D S
Master's in Computer Engineering ( Software Track )
Syracuse University
Syracuse - 13210
New York
United States of America


RE: Advanced search with results matrix

2012-05-21 Thread Chris Hostetter

: No, it's not just one single query, rather, as I've mentioned before, it's
: combination of searches with result count for each combination.  Explained
: in detail below:
: 1) (SQL Server OR SQL)
: 2) (Visual Basic OR VB.NET)
: 3) (Java AND JavaScript)
: 4) (SQL Server OR SQL) AND (Visual Basic OR VB.NET)
: 5) (Visual Basic OR VB.NET) AND (Java AND JavaScript)
: 6) (SQL Server OR SQL) AND (Java AND JavaScript)
: 7) (SQL Server OR SQL) AND (Visual Basic OR VB.NET) AND (Java AND
: JavaScript)

As an added bonus, you can use nested parsers to simplify how you express 
your query...

q1=...input from textbox #1...
q2=...input from textbox #2...
q3=...input from textbox #3...
q=*:*
facet=true
facet.query={!v=q1}
facet.query={!v=q2}
facet.query={!v=q3}
facet.query=+_query_:{!v=$q1} +_query_:{!v=$q2} 
facet.query=+_query_:{!v=$q1} +_query_:{!v=$q3} 
facet.query=+_query_:{!v=$q2} +_query_:{!v=$q3} 
facet.query=+_query_:{!v=$q3} +_query_:{!v=$q4} 

...which doesn't look simpler, until you reallize that you can hardcode 
everything except q1, q2, and q3 in default params for a special 
request handler in your solrconfig.xml


-Hoss


Re: And results before Or results

2012-05-21 Thread Chris Hostetter

: Interesting, omitTf=true eventhough it would give strict enforcement,
: wouldnt it affect the relevancy? Like, I am wondering if the ordering
: amongst the three word matches would be not as good as it would be when we
: have omitNorms=trueomitTf=true. Do you have an idea?

It will *absolutely* affect the ranking ... that's the entire point.

if the complaint is docA containig only two of the clauses 
scores higher then docB matching all 3 clauses the reason for that is 
(usually) because tf/idf scoring for docA is a *REALLY* good match for 
those two clauses (ie: they occur many, many times) where as docB might 
match all three but it may only match each of them once.  you can't 
garuntee a strict ordering based on number of clauses that match unless 
you eliminate term freq and norms from the equation.

That said, i realize now that i forgot to finish my previous message with 
the However... comment...

However... if you still want the tf/idf and length norm to be a factor, 
but you just want to change the penalty of not matching all terms to be 
much higher (which doesn't garuntee a strict ordering, but biases things 
so much it's unlikely to ever be a factor) you could also play arround 
with a a custom implemntation of the coord factor in the similarity...

http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html#coord%28int,%20int%29

:  : I want to have a strick enforcement that In case of a 3 word search,
:  those
:  : results that match all 3 term should be presented ahead of those that
:  match
:  : 2 terms when I set mm=2.
:  :
:  : I have seen quite some cases where, those results that match 2 out of 3
:  : words appear ahead of those matching all 3 words.
: 
:  which can happen because of tf/idf and length normalization.
: 
:  if you disable all of those things for hte fields you
:  search on (omitNorms=true omitTf=true) you should see a strict ordering
:  based on the number of matching clauses.


-Hoss


Re: Indexing files using multi-cores - could not fix after many retries

2012-05-21 Thread Gora Mohanty
On 22 May 2012 05:12, sudarshan chakravarthy.sudars...@gmail.com wrote:
[...]
 requestHandler name=/update/csv
                  class=solr.CSVRequestHandler
                  startup=lazy /
[...]


 Response:
 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 400 Unexpected character 'b' (code 98) in prolog; expected
 'lt;'
  at [row,col {unknown-source}]: [1,1]/title
 /head
 body
 HTTP ERROR 400

 pProblem accessing /solr/core0/update/.
[...]

Looks like your CSV handler is set up at /update/csv
whereas you are posting to /update. By default, the
handler there expects XML, which is the source of
the error.

Try posting to  /solr/core0/update/csv/

Regards,
Gora