Multiple schemas in the same SolrCloud ?

2013-10-09 Thread xinwu
Hi all,

I want to use the multiple schemas in the same solrCloud, is it allowed?

If it is allowed,how?


These schemas may have no relation.

Thank You.

Dai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279.html
Sent from the Solr - User mailing list archive at Nabble.com.


synonyms and term position

2013-10-09 Thread Alvaro Cabrerizo
Hi:

I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a
problem using SynonymFilterFactory within the process chain
SynonymFilterFactory, StopFilterFactory .

I have configured synonyms.txt to expand the word AIO as: all-in-one. Well,
when using solr 1.4 I get the following result (term position) when
analysing the string one aio two.

Solr 1.4 after synonym:

term position |1 | 2 |3 |4 |5
term text |one| all |in |one |two

Solr 1.4 after stopfilter (in term is deleted and terms all and one
are consecutive)

term position |1 | 2 |4 |5
term text |one| all |one |two



But when using solr4.4 I get:

Solr 4.4 after synonym:

term position |1 | 2 |3 |4 |3
term text |one| all |in |one |two

Solr 4.4 after stop (in is deleted and the term two is now close to
all :

term position |1 | 2 |4 |3
term text |one| all |one |two



The problem is that the second word two is in position 3 in solr4.4 so
when I try to search aio, in solr1.4 I get results, but find nothing using
Solr4. Is there any option to configure solr4 that imitates solr1.4
behavior.


Regards.




Please, find attached the fieldtype configuration.

fieldType name=text class=solr.TextField positionIncrementGap=100
autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt /
/analyzer
/fieldType


Re: Multiple schemas in the same SolrCloud ?

2013-10-09 Thread Anshum Gupta
You can simply have multiple collections, each independent of each other on
the schema but could run on the same instance/jvm if you want.


On Wed, Oct 9, 2013 at 12:36 PM, xinwu xinwu0...@gmail.com wrote:

 Hi all,

 I want to use the multiple schemas in the same solrCloud, is it allowed?

 If it is allowed,how?


 These schemas may have no relation.

 Thank You.

 Dai.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: dynamic field question

2013-10-09 Thread Aloke Ghoshal
Hi David,

A separate Solr document for each section is a good option if you also need
to handle phrases, case, special characters, etc. within the title field.
How do you map them to dynamic fields?

E.g.: Appendix for cities, APPENDIX 1: Cities

Regards,
Aloke


On Wed, Oct 9, 2013 at 9:45 AM, Jack Krupansky j...@basetechnology.comwrote:

 I'd suggest that each of your source document sections would be a distinct
 solr document. All of the sections could have a source document ID field
 to tie them together.

 Dynamic fields work best when used in moderation. Your use case seems like
 an excessive use of dynamic fields.

 -- Jack Krupansky

 -Original Message- From: Twomey, David
 Sent: Tuesday, October 08, 2013 6:59 PM
 To: solr-user@lucene.apache.org
 Subject: dynamic field question



 I am having trouble trying to return a particular dynamic field only
 instead of all dynamic fields.

 Imagine I have a document with an unknown number of sections.  Each
 section can have a 'title' and a 'body'

 I have each section title and body as dynamic fields such as
 section_title_* and section_body_*

 Imagine that some documents contain a section that has a title=Appendix

 I want a query that will find all docs with that section and return just
 the Appendix section.

 I don't know how to return just that one section though

 I can copyField my dynamic field section_title_* into a static field
 called section_titles and query that for docs that contain the Appendix

 But I don't know how to only return that one dynamic field

 ?q=section_titles:Appendixfl=**section_body_*

 Any ideas?   I can't seem to put a conditional in the fl parameter






Re: Multiple schemas in the same SolrCloud ?

2013-10-09 Thread xinwu
I remember I must put the
-Dbootstrap_confdir=/opt/Solr_home/collection1/conf
-Dcollection.configName=solrConfig  in the catalina.sh .

Is it means that solrCloud must have one ,and only one, schema?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094281.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ best pratices

2013-10-09 Thread Furkan KAMACI
I suggest you to look at here:
http://wiki.apache.org/solr/Solrj?action=fullsearchcontext=180value=cloudsolrservertitlesearch=Titles#Using_with_SolrCloud


2013/10/9 Shawn Heisey s...@elyograg.org

 On 10/7/2013 3:08 PM, Mark wrote:

 Some specific questions:
 - When working with HttpSolrServer should we keep around instances for
 ever or should we create a singleton that can/should be used over and over?
 - Is there a way to change the collection after creating the server or do
 we need to create a new server for each collection?


 If at all possible, you should create your server object and use it for
 the life of your application.  SolrJ is threadsafe.  If there is any part
 of it that's not, the javadocs should say so - the SolrServer
 implementations definitely are.

 By using the word collection you are implying that you are using
 SolrCloud ... but earlier you said HttpSolrServer, which implies that you
 are NOT using SolrCloud.

 With HttpSolrServer, your base URL includes the core or collection name -
 http://server:port/solr/**corename; for example.  Generally you will
 need one object for each core/collection, and another object for
 server-level things like CoreAdmin.

 With SolrCloud, you should be using CloudSolrServer instead, another
 implementation of SolrServer that is constantly aware of the SolrCloud
 clusterstate.  With that object, you can use setDefaultCollection, and you
 can also add a collection parameter to each SolrQuery or other request
 object.

 Thanks,
 Shawn




Re: SolrCloud High Availability during indexing operation

2013-10-09 Thread Furkan KAMACI
Hi Saurabh,
Your link does not work (it is broken).


2013/10/9 Saurabh Saxena ssax...@gopivotal.com

 Pastbin link http://pastebin.com/cnkXhz7A

 I am doing a bulk request. I am uploading 100 files, each file having 100
 docs.

 -Saurabh


 On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote:

  The attachment did not go through - try using pastebin.com or something.
 
  Are you adding docs with curl one at a time or in bulk per request.
 
  - Mark
 
  On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com
 wrote:
 
   Repeated the experiments on local system. Single shard Solrcloud with a
  replica. Tried to index 10K docs. All the indexing operation were
  redirected to replica Solr node. While the document while getting indexed
  on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
  docs got indexed. If I repeat the experiment without shutting down the
  leader instance, all 10K docs get indexed. I am using curl to upload the
  docs, there was no curl error while uploading documents.
  
   Following error was there in replica log file.
  
   ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: No registered leader was found,
  collection:test_collection slice:shard1
  
   Attached replica log file.
  
  
   On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com
 
  wrote:
   Sorry for the late reply.
  
   All the documents have unique id. If I repeat the experiment, the num
 of
  docs indexed changes (I guess it depends when I shutdown a particular
  shard). When I do the experiment without shutting down leader Shards, all
  80k docs get indexed (which I think proves that all documents are valid).
  
   I need to dig the logs to find error message. Also, I am not tracking
 of
  curl return code, will run again and reply.
  
   Regards,
   Saurabh
  
  
   On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
   And do any of the documents have the same uniqueKey, which
   is usually called id? Subsequent adds of docs with the same
   uniqueKey replace the earlier one.
  
   It's not definitive because it changes as merges happen, old copies
   of docs that have been deleted or updated will be purged, but what
   does your admin page show for maxDoc? If it's more than numDocs
   then you have duplicate uniqueKeys. NOTE: if you optimize
   (which you usually shouldn't) then maxDoc and numDocs will be
   the same so if you test this don't optimize.
  
   Best,
   Erick
  
  
   On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
   wun...@wunderwood.org wrote:
Did all of the curl update commands return success? Ane errors in the
  logs?
   
wunder
   
On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
   
Is it possible that some of those 80K docs were simply not valid?
 e.g.
had a wrong field, had a missing required field, anything like that?
What happens if you clear this collection and just re-run the same
indexing process and do everything else the same?  Still some docs
missing?  Same number?
   
And what if you take 1 document that you know is valid and index it
80K times, with a different ID, of course?  Do you see 80K docs in
 the
end?
   
Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
   
   
   
On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena 
  ssax...@gopivotal.com wrote:
Doc count did not change after I restarted the nodes. I am doing a
  single
commit after all 80k docs. Using Solr 4.4.
   
Regards,
Saurabh
   
   
On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:
   
Interesting. Did the doc count change after you started the nodes
  again?
Can you tell us about commits?
Which version? 4.5 will be out soon.
   
Otis
Solr  ElasticSearch Support
http://sematext.com/
On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com
  wrote:
   
Hello,
   
I am testing High Availability feature of SolrCloud. I am using
 the
following setup
   
- 8 linux hosts
- 8 Shards
- 1 leader, 1 replica / host
- Using Curl for update operation
   
I tried to index 80K documents on replicas (10K/replica in
  parallel).
During indexing process, I stopped 4 Leader nodes. Once indexing
  is done,
out of 80K docs only 79808 docs are indexed.
   
Is this an expected behaviour ? In my opinion replica should take
  care of
indexing if leader is down.
   
If this is an expected behaviour, any steps that can be taken
 from
  the
client side to avoid such a situation.
   
Regards,
Saurabh Saxena
   
   
   
--
Walter Underwood
wun...@wunderwood.org
   
   
   
  
  
 
 



Re: synonyms and term position

2013-10-09 Thread Furkan KAMACI
Could you send screenshot of  admin Analysis page when trying to analyze
that words?


2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Hi:

 I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a
 problem using SynonymFilterFactory within the process chain
 SynonymFilterFactory, StopFilterFactory .

 I have configured synonyms.txt to expand the word AIO as: all-in-one. Well,
 when using solr 1.4 I get the following result (term position) when
 analysing the string one aio two.

 Solr 1.4 after synonym:

 term position |1 | 2 |3 |4 |5
 term text |one| all |in |one |two

 Solr 1.4 after stopfilter (in term is deleted and terms all and one
 are consecutive)

 term position |1 | 2 |4 |5
 term text |one| all |one |two



 But when using solr4.4 I get:

 Solr 4.4 after synonym:

 term position |1 | 2 |3 |4 |3
 term text |one| all |in |one |two

 Solr 4.4 after stop (in is deleted and the term two is now close to
 all :

 term position |1 | 2 |4 |3
 term text |one| all |one |two



 The problem is that the second word two is in position 3 in solr4.4 so
 when I try to search aio, in solr1.4 I get results, but find nothing using
 Solr4. Is there any option to configure solr4 that imitates solr1.4
 behavior.


 Regards.




 Please, find attached the fieldtype configuration.

 fieldType name=text class=solr.TextField positionIncrementGap=100
 autoGeneratePhraseQueries=true
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt /
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt /
 /analyzer
 /fieldType



Re: Multiple schemas in the same SolrCloud ?

2013-10-09 Thread Furkan KAMACI
You can have more information from here:
https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files


2013/10/9 xinwu xinwu0...@gmail.com

 I remember I must put the
 -Dbootstrap_confdir=/opt/Solr_home/collection1/conf
 -Dcollection.configName=solrConfig  in the catalina.sh .

 Is it means that solrCloud must have one ,and only one, schema?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094281.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms and term position

2013-10-09 Thread Alvaro Cabrerizo
Sure,

Find attached the screenshots with almost all the analysis, (dont worry
about the lowercase and the porter stemmer)

Regards.




On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as: all-in-one.
 Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in solr4.4 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType
 



Re: no such field error:smaller big block size details while indexing doc files

2013-10-09 Thread sweety
I will try using solrj.Thanks.

but I tried to index .docx file I am getting  some different error:
SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more
I read this 
solution(http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika),which
 says removal of jars solves errors,but there are no such mentioned jars in my 
classpath.
Is it that,Jars may cause the issue?

Thank You.



On Wednesday, October 9, 2013 12:54 PM, sweety shinde 
sweetyshind...@yahoo.com wrote:
 
I will try using solrJ.

Now I tried indexing .docx files and I get some different error,logs are:
SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at 

Re: no such field error:smaller big block size details while indexing doc files

2013-10-09 Thread sweety
I will try using solrJ.

Now I tried indexing .docx files and I get some different error,logs are:
SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.VerifyError: (class: 
org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: 
(Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;)
 Wrong return type in function
at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
... 16 more

But does the jars cause these errors? Because I read one solution which said 
removal of few jars in classpath may solve the errors,but those jars are not 
present in my classpath.(the link to solution 
:http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika)

Thank You.



On Wednesday, October 9, 2013 6:05 AM, Erick Erickson [via Lucene] 
ml-node+s472066n4094231...@n3.nabble.com wrote:
 
Hmmm, that is odd, the glob dynamicField should 
pick this up. 

Not quite sure what's going on. You an parse the file 
via Tika yourself and look at what's in there, it's a relatively 
simple SolrJ program, here's a sample: 
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best, 
Erick 

On Tue, Oct 8, 2013 at 4:15 PM, sweety [hidden email] wrote: 

 This my new schema.xml: 
 schema  name=documents 
 fields 
 field name=id type=string indexed=true stored=true required=true 
 multiValued=false/ 
 field name=author type=string indexed=true stored=true 
 multiValued=true/ 
 field name=comments type=text indexed=true stored=true 
 multiValued=false/ 
 field name=keywords type=text indexed=true stored=true 
 multiValued=false/ 
 field name=contents type=text indexed=true stored=true 
 multiValued=false/ 
 field name=title type=text indexed=true stored=true 
 multiValued=false/ 
 field name=revision_number type=string indexed=true stored=true 
 multiValued=false/ 
 field name=_version_ type=long indexed=true stored=true 
 multiValued=false/ 
 dynamicField name=ignored_* type=string indexed=false stored=true 
 multiValued=true/ 
 dynamicField name=* type=ignored  multiValued=true / 
 copyfield source=id dest=text / 
 copyfield source=author dest=text / 
 /fields 
 types 
 fieldtype name=ignored stored=false indexed=false 
 class=solr.StrField / 
 fieldType name=integer class=solr.IntField / 
 fieldType name=long class=solr.LongField / 
 fieldType name=string class=solr.StrField  / 
 fieldType name=text 

Re: synonyms and term position

2013-10-09 Thread Furkan KAMACI
Does two has a synonym of in and one?


2013/10/9 Furkan KAMACI furkankam...@gmail.com

 Does two has a synonym of in and one?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Sure,

 Find attached the screenshots with almost all the analysis, (dont worry
 about the lowercase and the porter stemmer)

 Regards.




 On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm
 having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as: all-in-one.
 Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and
 one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in solr4.4
 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType
 






Re: synonyms and term position

2013-10-09 Thread Alvaro Cabrerizo
No, it has no synonyms.


On Wed, Oct 9, 2013 at 10:48 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Does two has a synonym of in and one?


 2013/10/9 Furkan KAMACI furkankam...@gmail.com

 Does two has a synonym of in and one?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Sure,

 Find attached the screenshots with almost all the analysis, (dont worry
 about the lowercase and the porter stemmer)

 Regards.




 On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI 
 furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm
 having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as: all-in-one.
 Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and
 one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close
 to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in solr4.4
 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType
 







Re: synonyms and term position

2013-10-09 Thread Alvaro Cabrerizo
The synonyms.txt has defined the next associations defined.

AIO=All in one
aio=all-in-one

Regards.


On Wed, Oct 9, 2013 at 11:05 AM, Alvaro Cabrerizo topor...@gmail.comwrote:

 No, it has no synonyms.


 On Wed, Oct 9, 2013 at 10:48 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Does two has a synonym of in and one?


 2013/10/9 Furkan KAMACI furkankam...@gmail.com

 Does two has a synonym of in and one?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

 Sure,

 Find attached the screenshots with almost all the analysis, (dont worry
 about the lowercase and the porter stemmer)

 Regards.




 On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI 
 furkankam...@gmail.comwrote:

 Could you send screenshot of  admin Analysis page when trying to
 analyze
 that words?


 2013/10/9 Alvaro Cabrerizo topor...@gmail.com

  Hi:
 
  I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm
 having a
  problem using SynonymFilterFactory within the process chain
  SynonymFilterFactory, StopFilterFactory .
 
  I have configured synonyms.txt to expand the word AIO as:
 all-in-one. Well,
  when using solr 1.4 I get the following result (term position) when
  analysing the string one aio two.
 
  Solr 1.4 after synonym:
 
  term position |1 | 2 |3 |4 |5
  term text |one| all |in |one |two
 
  Solr 1.4 after stopfilter (in term is deleted and terms all and
 one
  are consecutive)
 
  term position |1 | 2 |4 |5
  term text |one| all |one |two
 
 
 
  But when using solr4.4 I get:
 
  Solr 4.4 after synonym:
 
  term position |1 | 2 |3 |4 |3
  term text |one| all |in |one |two
 
  Solr 4.4 after stop (in is deleted and the term two is now close
 to
  all :
 
  term position |1 | 2 |4 |3
  term text |one| all |one |two
 
 
 
  The problem is that the second word two is in position 3 in
 solr4.4 so
  when I try to search aio, in solr1.4 I get results, but find nothing
 using
  Solr4. Is there any option to configure solr4 that imitates solr1.4
  behavior.
 
 
  Regards.
 
 
 
 
  Please, find attached the fieldtype configuration.
 
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
  autoGeneratePhraseQueries=true
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
  generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
  protected=protwords.txt /
  /analyzer
  /fieldType
 








Collection API wrong configuration

2013-10-09 Thread maephisto
I'm experimenting with SolrCloud using Solr 4.5.0  and the Collection API

What i did was: 
1. upload configuration to ZK
zkcli.sh -cmd upconfig -zkhost 127.0.0.1:8993 -d
solr/my_custom_collection/conf/ -n my_custom_collection
2. create a collection using the api:
/admin/collections?action=CREATEname=my_custom_collectionnumShards=2replicationFactor=2maxShardsPerNode=2collection.configName=my_custom_config

The outcome of these action seem to be that the collection cores don't use
the my_custom_collection but the example configuration.
Any idea why this is happening?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dynamic field question

2013-10-09 Thread Twomey, David
OK.  Then the JSON returned would contain a lot of documents that are
really sections. This would work fine for the use-case I mentioned but I
also use the index for full-text search of the whole document.  Therefore,
I would need to parse the result JSON in a way that combines the solr docs
returned in to one virtual doc based on source document ID.

Is that correct?

On 10/9/13 6:15 AM, Jack Krupansky j...@basetechnology.com wrote:

I'd suggest that each of your source document sections would be a
distinct 
solr document. All of the sections could have a source document ID
field 
to tie them together.

Dynamic fields work best when used in moderation. Your use case seems
like 
an excessive use of dynamic fields.

-- Jack Krupansky

-Original Message-
From: Twomey, David
Sent: Tuesday, October 08, 2013 6:59 PM
To: solr-user@lucene.apache.org
Subject: dynamic field question


I am having trouble trying to return a particular dynamic field only
instead 
of all dynamic fields.

Imagine I have a document with an unknown number of sections.  Each
section 
can have a 'title' and a 'body'

I have each section title and body as dynamic fields such as
section_title_* 
and section_body_*

Imagine that some documents contain a section that has a title=Appendix

I want a query that will find all docs with that section and return just
the 
Appendix section.

I don't know how to return just that one section though

I can copyField my dynamic field section_title_* into a static field
called 
section_titles and query that for docs that contain the Appendix

But I don't know how to only return that one dynamic field

?q=section_titles:Appendixfl=section_body_*

Any ideas?   I can't seem to put a conditional in the fl parameter






Re: Collection API wrong configuration

2013-10-09 Thread maephisto
Using Solr 4.4.0 the same scenarion behaves as expected.

Can anyone else try this, to check if it this only happens with 4.5.0 and if
so, is this a desired behaviour or a bug?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319p4094323.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection API wrong configuration

2013-10-09 Thread Shalin Shekhar Mangar
This may be a bug in 4.5

Another user has also reported this bug:
https://issues.apache.org/jira/browse/SOLR-5307


On Wed, Oct 9, 2013 at 3:51 PM, maephisto my_sky...@yahoo.com wrote:

 Using Solr 4.4.0 the same scenarion behaves as expected.

 Can anyone else try this, to check if it this only happens with 4.5.0 and
 if
 so, is this a desired behaviour or a bug?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319p4094323.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Collection API wrong configuration

2013-10-09 Thread primoz . skale
Works fine at my end. I use Solr 4.5.0 on Windows 7. 

I tried:

zkcli.bat -cmd upconfig -zkhost localhost:9000 -d 
..\solr\collection2\conf -n my_custom_collection

java -Djetty.port=8001 -DzkHost=localhost:9000 -jar start.jar

and finally

http://localhost:8001/solr/admin/collections?action=CREATEname=my_custom_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_custom_collection

If I open newly created core/shard I can see under Schema the modified 
schema file.

Best regards,

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   09.10.2013 11:57
Subject:Collection API wrong configuration



I'm experimenting with SolrCloud using Solr 4.5.0  and the Collection API

What i did was: 
1. upload configuration to ZK
zkcli.sh -cmd upconfig -zkhost 127.0.0.1:8993 -d
solr/my_custom_collection/conf/ -n my_custom_collection
2. create a collection using the api:
/admin/collections?action=CREATEname=my_custom_collectionnumShards=2replicationFactor=2maxShardsPerNode=2collection.configName=my_custom_config

The outcome of these action seem to be that the collection cores don't use
the my_custom_collection but the example configuration.
Any idea why this is happening?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319.html

Sent from the Solr - User mailing list archive at Nabble.com.



Find documents that are composed of % words

2013-10-09 Thread shahzad73
Is there a way that in Solr Query i find documents that is composed of n
number of words. for example here is the list of words
- Love
- Ice
- Cream
- Sunny
- I
- To
- A
- On
- Elephant
- Balloon

And a percentage such as: 80%

Let’s assume you’re analyzing the text of the following sentence.

“I love to eat ice cream on a sunny day”

This sentence contains 10 words, and only 2 (Day and Eat) of them do not
appear on the list. So the score for this sentence would be 80%. So this
would be a valid search result.


If the user had entered 90%, then this sentence would not be a valid result,
since more than 10% aren’t on the list


Shahzad



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection API wrong configuration

2013-10-09 Thread maephisto
Yes, the problem described in the ticket is what I'm also confronting with.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-in-4-5-0-tp4094319p4094335.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dynamic field question

2013-10-09 Thread Otis Gospodnetic
David,

Yes. Document grouping (aka  field collapsing) will help you here. It
should also allow you to create a better search experience on the front end
- it's often better to narrow down where in a large document a match is
than give users a large doc and say: we know the match is in here
somewhere, you just have to locate it now.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Oct 9, 2013 6:21 AM, Twomey, David david.two...@novartis.com wrote:

 OK.  Then the JSON returned would contain a lot of documents that are
 really sections. This would work fine for the use-case I mentioned but I
 also use the index for full-text search of the whole document.  Therefore,
 I would need to parse the result JSON in a way that combines the solr docs
 returned in to one virtual doc based on source document ID.

 Is that correct?

 On 10/9/13 6:15 AM, Jack Krupansky j...@basetechnology.com wrote:

 I'd suggest that each of your source document sections would be a
 distinct
 solr document. All of the sections could have a source document ID
 field
 to tie them together.
 
 Dynamic fields work best when used in moderation. Your use case seems
 like
 an excessive use of dynamic fields.
 
 -- Jack Krupansky
 
 -Original Message-
 From: Twomey, David
 Sent: Tuesday, October 08, 2013 6:59 PM
 To: solr-user@lucene.apache.org
 Subject: dynamic field question
 
 
 I am having trouble trying to return a particular dynamic field only
 instead
 of all dynamic fields.
 
 Imagine I have a document with an unknown number of sections.  Each
 section
 can have a 'title' and a 'body'
 
 I have each section title and body as dynamic fields such as
 section_title_*
 and section_body_*
 
 Imagine that some documents contain a section that has a title=Appendix
 
 I want a query that will find all docs with that section and return just
 the
 Appendix section.
 
 I don't know how to return just that one section though
 
 I can copyField my dynamic field section_title_* into a static field
 called
 section_titles and query that for docs that contain the Appendix
 
 But I don't know how to only return that one dynamic field
 
 ?q=section_titles:Appendixfl=section_body_*
 
 Any ideas?   I can't seem to put a conditional in the fl parameter
 
 
 




Re: Find documents that are composed of % words

2013-10-09 Thread Otis Gospodnetic
Hi,

You can take your words, combine some % of them with AND. Then take another
set of them OR it with the previous set, and so on.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Oct 9, 2013 6:54 AM, shahzad73 shahzad...@yahoo.com wrote:

 Is there a way that in Solr Query i find documents that is composed of
 n
 number of words. for example here is the list of words
 - Love
 - Ice
 - Cream
 - Sunny
 - I
 - To
 - A
 - On
 - Elephant
 - Balloon

 And a percentage such as: 80%

 Let’s assume you’re analyzing the text of the following sentence.

 “I love to eat ice cream on a sunny day”

 This sentence contains 10 words, and only 2 (Day and Eat) of them do not
 appear on the list. So the score for this sentence would be 80%. So this
 would be a valid search result.


 If the user had entered 90%, then this sentence would not be a valid
 result,
 since more than 10% aren’t on the list


 Shahzad



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: dynamically adding core with auto-discovery in Solr 4.5

2013-10-09 Thread Erick Erickson
Jan:

This worked for me if I do NOT have a core.properties at all in my new
core. Personally I think the behavior in 4.4 was dangerous, what
happens if you mis-type the command for instance? You could do  Bad
Things to the old core you were inadvertently re-creating.

The core.properties file gets created in the new core as a results of
the CREATE command. So go ahead and try it again without a
core.properties file perhaps?

The RELOAD command is intended to be used when you change the schema
or solrconfig files and want the core indicated to start working with
the new definitions. It relies on having a record of that core to
work.

When Solr starts up in discovery mode, it explores the directory tree
and keeps an internal map of all the cores, transient, loaded etc. The
reload then looks at that map and barfs if the core isn't there.

Changing this seems like more work than reward, how would the code
know where to look for the core to load? It would have to do a re-walk
of the tree, or rely on instanceDir being an absolute path etc.
Do-able but not worth it IMO.

Best,
Erick

On Tue, Oct 8, 2013 at 4:38 PM, Jan Van Besien ja...@ngdata.com wrote:
 Hi,

 We are using auto discovery and have a use case where we want to be
 able to add cores dynamically, without restarting solr.

 In 4.4 we were able to
 - add a directory (e.g. core1) with an empty core.properties
 - call 
 http://localhost:8983/solr/admin/cores?action=CREATEcore=core1name=core1instanceDir=%2Fsomewhere%2Fcore1

 In 4.5 however this (the second step) fails, saying it cannot create a
 new core in that directory because another core is already defined
 there.

 From the documentation (http://wiki.apache.org/solr/CoreAdmin), I
 understand that since 4.3 we should actually do RELOAD. However,
 RELOAD results in this stacktrace:

 org.apache.solr.common.SolrException: Error handling 'reload' action
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:673)
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:172)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:322) at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at
 org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at
 org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: org.apache.solr.common.SolrException: Unable to reload
 core: core1 at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:936)
 at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:691)
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:671)
 ... 20 more Caused by: org.apache.solr.common.SolrException: No such
 core: core1 at 
 org.apache.solr.core.CoreContainer.reload(CoreContainer.java:642)
 ... 21 more

 Note that before I RELOAD, the core1 directory was created.

 Also note that next to the core1 directory, there is a core0 directory
 which has exactly the same content and is auto-discovered perfectly
 fine at startup.

 So... what should it be? Or am I missing something here?

 thanks in advance,
 Jan


Re: {soft}Commit and cache flusing

2013-10-09 Thread Erick Erickson
Tim:

I think you're mis-interpreting. By replying to a post with the subject:

{soft}Commit and cache flushing

but going in a different direction, it's easy for people to think I'm
not interested in that
thread, I'll ignore it, thereby missing the fact that you're asking a
somewhat different
question that they might have information about. It's not about whether you're
doing anything particularly wrong with the question. It's about making
it easy for
people to help.

See http://people.apache.org/~hossman/#threadhijack

Best,
Erick

On Tue, Oct 8, 2013 at 6:23 PM, Tim Vaillancourt t...@elementspace.com wrote:
 I have a genuine question with substance here. If anything this
 nonconstructive, rude response was to get noticed. Thanks for
 contributing to the discussion.

 Tim


 On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote:

 Tim,
 I suggest you open a new thread and not reply to this one to get noticed.
 Dmitry


 On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com
 wrote:

  Is there a way to make autoCommit only commit if there are pending
 changes,
  ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher
  and wipe the caches)?
 
  Cheers,
 
  Tim
 
 
  On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote:
 
   right. We've got the autoHard commit configured only atm. The
  soft-commits
   are controlled on the client. It was just easier to implement the first
   version of our internal commit policy that will commit to all solr
   instances at once. This is where we have noticed the reported behavior.
  
  
   On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu
  wrote:
  
if there are no modifications to an index and a softCommit or
  hardCommit
issued, then solr flushes the cache.
   
   
Indeed. The easiest way to work around this is by disabling auto
  commits
and only commit when you have to.
   
  
 



Re: How to share Schema between multicore on Solr 4.4

2013-10-09 Thread Erick Erickson
Shawn:

Hmmm, I hadn't thought about that before. The shareSchema
stuff is keyed off the absolute directory (and timestamp) of
the schema.xml file associated with a core and is about
sharing the internal object that holds the parsed schema.

Do you know for sure if the fact that this is coming from ZK
actually shares the schema object? 'Cause I've never
looked to see and it would be a good thing to have in my
head...


Thanks!
Erick

On Tue, Oct 8, 2013 at 8:33 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/7/2013 6:02 AM, Dharmendra Jaiswal wrote:

 I am using Solr 4.4 version with SolrCloud on Windows machine.
 Somehow i am not able to share schema between multiple core.


 If you're in SolrCloud mode, then you already *are* sharing your schema.
 You are also sharing your configuration.  Both of them are in zookeeper.
 All collections (and all shards within a collection) which use a given
 config name are using the same copy.

 Any copies of your config/schema that might be on your disk are *NOT* being
 used.  If you are starting Solr with any bootstrap options, then the config
 set that is in zookeeper might be getting overwritten by whats on your disk
 when Solr restarts, but otherwise SolrCloud *only* uses zookeeper for
 config/schema. The bootstrap options are meant to be used once, and I
 actually prefer to get SolrCloud operational without using bootstrap options
 at all.

 Thanks,
 Shawn



Re: dynamically adding core with auto-discovery in Solr 4.5

2013-10-09 Thread Jan Van Besien
On Wed, Oct 9, 2013 at 2:15 PM, Erick Erickson erickerick...@gmail.com wrote:
 This worked for me if I do NOT have a core.properties at all in my new
 core. Personally I think the behavior in 4.4 was dangerous, what
 happens if you mis-type the command for instance? You could do  Bad
 Things to the old core you were inadvertently re-creating.

Thanks, this works.

I think the documentation could be updated to indicate that
- auto-discovery is only at startup (that wasn't obvious to me)
- creating cores after startup should be done on a config directory
without a core.properties file

Jan


Re: Find documents that are composed of % words

2013-10-09 Thread Aloke Ghoshal
Hi Shahzad,

Have you tried with the Minimum Should Match feature:
http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

Regards,
Aloke


On Wed, Oct 9, 2013 at 4:55 PM, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

 Hi,

 You can take your words, combine some % of them with AND. Then take another
 set of them OR it with the previous set, and so on.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Oct 9, 2013 6:54 AM, shahzad73 shahzad...@yahoo.com wrote:

  Is there a way that in Solr Query i find documents that is composed
 of
  n
  number of words. for example here is the list of words
  - Love
  - Ice
  - Cream
  - Sunny
  - I
  - To
  - A
  - On
  - Elephant
  - Balloon
 
  And a percentage such as: 80%
 
  Let’s assume you’re analyzing the text of the following sentence.
 
  “I love to eat ice cream on a sunny day”
 
  This sentence contains 10 words, and only 2 (Day and Eat) of them do not
  appear on the list. So the score for this sentence would be 80%. So this
  would be a valid search result.
 
 
  If the user had entered 90%, then this sentence would not be a valid
  result,
  since more than 10% aren’t on the list
 
 
  Shahzad
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Permisions didn't check when call discoverUnder

2013-10-09 Thread Said Chavkin
Hello.

When in solr/home directory exists directory to which solr do not have
rights, then solr failed to start with exception:

2108 [main] INFO org.apache.solr.core.CoresLocator - Looking for core
definitions underneath /var/lib/solr
2109 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter - Could
not start Solr. Check solr/home property and the logs
2138 [main] ERROR org.apache.solr.core.SolrCore -
null:java.lang.NullPointerException
at 
org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:121)
at 
org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:130)
at 
org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:113)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:226)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at 
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at org.apache.catalina.core.StandardService.start(StandardService.java:516)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)

2138 [main] INFO org.apache.solr.servlet.SolrDispatchFilter -
SolrDispatchFilter.init() done

For example:

solr/home located on /var/lib/solr.

/var/lib/solr is another file system, it has lost+found directory, and
only root can read from this directory.
As result solr can't to star.
It is because discovering directory to which tomcat do not have rights.

Yours faithfully.


Re: dynamically adding core with auto-discovery in Solr 4.5

2013-10-09 Thread Erick Erickson
If you create a Wiki login, I'll be happy to add you to the
contributors list. It's always valuable to have fresh eyes
update docs while the ambiguities are still fresh!

Erick

On Wed, Oct 9, 2013 at 8:37 AM, Jan Van Besien ja...@ngdata.com wrote:
 On Wed, Oct 9, 2013 at 2:15 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 This worked for me if I do NOT have a core.properties at all in my new
 core. Personally I think the behavior in 4.4 was dangerous, what
 happens if you mis-type the command for instance? You could do  Bad
 Things to the old core you were inadvertently re-creating.

 Thanks, this works.

 I think the documentation could be updated to indicate that
 - auto-discovery is only at startup (that wasn't obvious to me)
 - creating cores after startup should be done on a config directory
 without a core.properties file

 Jan


Re: Permisions didn't check when call discoverUnder

2013-10-09 Thread Erick Erickson
What do you think Solr should do in this case? If the
process doesn't have permission to the dir, it can't
write to it. You need to set the permissions, or the
authority of the process that Solr is running as
appropriately.

Best,
Erick

On Wed, Oct 9, 2013 at 8:54 AM, Said Chavkin schav...@gmail.com wrote:
 Hello.

 When in solr/home directory exists directory to which solr do not have
 rights, then solr failed to start with exception:

 2108 [main] INFO org.apache.solr.core.CoresLocator - Looking for core
 definitions underneath /var/lib/solr
 2109 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter - Could
 not start Solr. Check solr/home property and the logs
 2138 [main] ERROR org.apache.solr.core.SolrCore -
 null:java.lang.NullPointerException
 at 
 org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:121)
 at 
 org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:130)
 at 
 org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:113)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:226)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
 at 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
 at 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
 at 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
 at 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
 at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
 at 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
 at 
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
 at 
 org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
 at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
 at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
 at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
 at 
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
 at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
 at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
 at org.apache.catalina.core.StandardService.start(StandardService.java:516)
 at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
 at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)

 2138 [main] INFO org.apache.solr.servlet.SolrDispatchFilter -
 SolrDispatchFilter.init() done

 For example:

 solr/home located on /var/lib/solr.

 /var/lib/solr is another file system, it has lost+found directory, and
 only root can read from this directory.
 As result solr can't to star.
 It is because discovering directory to which tomcat do not have rights.

 Yours faithfully.


RE: Searching on (hyphenated/capitalized) word issue

2013-10-09 Thread Van Tassell, Kristian
Thank you Upayavira.

I'm trying to figure out what will make Solr stem on multi in the word 
multicad so that any attempt to search on multicad, Multi-CAD or 
multiCAD will return results. The WordDelimiterFilterFactory helps with the 
case of multi followed by a dash or a capital letter, but I'm not sure how to 
get Solr to tokenize the word multi. Should I look at ngram configurations? 
Or is there a filter which promotes (rather than protects) words from being 
stemmed? (in other words, I could configure in a txt file that multi should 
be stemmed.

Just to reiterate, I am not getting any results when I search for the word 
multicad, even though it appears many times in the text as multiCAD and 
Multi-CAD.

Here is my configuration:

analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_en.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/ 
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Monday, September 30, 2013 1:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Searching on (hyphenated/capitalized) word issue

You need to look at your analysis chain. The stuff you're talking about there 
is all configurable.

There's different tokenisers available to split your fields differently, then 
you might use the WordDelimiterFilterFactory to split existing tokens further 
(e.g. WiFi might become wi, fi and WiFi). So really, you need to craft 
your own analysis chain to fit the kind of data you are working with.

Upayavira

On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote:
 I have a search term multi-CAD being issues on tokenized text.  The 
 problem is that you cannot get any search results when you type 
 multicad unless you add a hyphen (multi-cad) or type multiCAD
 (omitting the hyphen, but correctly adding the CAPS into the spelling).
 
 
 
 However, for the similar but unhyphenated word AutoCAD, you can type 
 autocad and get hits for AutoCAD, as you would expect. You can type 
 auto-cad and get the same results.
 
 The query seems to get parsed as separate words (resulting in hits) 
 for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad.
 In other words, the search terms  become multi cad and auto cad 
 for all cases except for when the term is multicad.
 
 I'm guessing this may be in part to auto being a more common word 
 prefix, but I may be wrong. Can anyone provide some clarity (and maybe 
 point me towards a potential solution)?
 
 Thanks in advance!
 
 
 Kristian Van Tassell
 Siemens Industry Sector
 Siemens Product Lifecycle Management Software Inc.
 5939 Rice Creek Parkway
 Shoreview, MN  55126 United States
 Tel.  :+1 (651) 855-6194
 Fax  :+1 (651) 855-6280
 kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20
 www.siemens.com/plm
 


Re: Permisions didn't check when call discoverUnder

2013-10-09 Thread Said Chavkin
I'm not sure, may be solr should skip inaccessible directory.
Because it is standard rule to place service on separate filesystem.
On the other hand it is possible to place solr/home not on the top of
mounted fs.

Anyway it would be better if error message was more clearly.

2013/10/9 Erick Erickson erickerick...@gmail.com:
 What do you think Solr should do in this case? If the
 process doesn't have permission to the dir, it can't
 write to it. You need to set the permissions, or the
 authority of the process that Solr is running as
 appropriately.

 Best,
 Erick

 On Wed, Oct 9, 2013 at 8:54 AM, Said Chavkin schav...@gmail.com wrote:
 Hello.

 When in solr/home directory exists directory to which solr do not have
 rights, then solr failed to start with exception:

 2108 [main] INFO org.apache.solr.core.CoresLocator - Looking for core
 definitions underneath /var/lib/solr
 2109 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter - Could
 not start Solr. Check solr/home property and the logs
 2138 [main] ERROR org.apache.solr.core.SolrCore -
 null:java.lang.NullPointerException
 at 
 org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:121)
 at 
 org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:130)
 at 
 org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:113)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:226)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127)
 at 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
 at 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
 at 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
 at 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
 at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
 at 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
 at 
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
 at 
 org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
 at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
 at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
 at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
 at 
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
 at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
 at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
 at org.apache.catalina.core.StandardService.start(StandardService.java:516)
 at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
 at org.apache.catalina.startup.Catalina.start(Catalina.java:593)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
 at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)

 2138 [main] INFO org.apache.solr.servlet.SolrDispatchFilter -
 SolrDispatchFilter.init() done

 For example:

 solr/home located on /var/lib/solr.

 /var/lib/solr is another file system, it has lost+found directory, and
 only root can read from this directory.
 As result solr can't to star.
 It is because discovering directory to which tomcat do not have rights.

 Yours faithfully.


Re: Shard split issue

2013-10-09 Thread Shalin Shekhar Mangar
I opened https://issues.apache.org/jira/browse/SOLR-5324


On Mon, Oct 7, 2013 at 2:20 PM, Yago Riveiro yago.rive...@gmail.com wrote:

 If the replica has 20G must probably the recovery will take more than 120
 seconds.

 In my case I have ssd's and 120 it's not enough.

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Monday, October 7, 2013 at 9:19 AM, Shalin Shekhar Mangar wrote:

  I think what is happening here is that the sub shard replicas are taking
  time to recover. We use a core admin command to wait for the replicas to
  become active before the shard states are switched. The timeout value for
  that command is just 120 seconds. We should wait for more than that. I'll
  open an issue.
 
 
  On Mon, Oct 7, 2013 at 2:47 AM, Yago Riveiro yago.rive...@gmail.com(mailto:
 yago.rive...@gmail.com) wrote:
 
   Seems the issue occurs when the shard has more than one replica.
  
   I unload all replicas of the shard (less 1 to do the split) and the
   SPLITSHARD finished as expected, the parent went to inactive and the
   children active.
  
   If the parent has more than 1 replica, the process apparently is
 finish,
   the total number of documents of children are the same of the parent,
 the
   problem is that the parent never goes to inactive state and the
 children
   are stuck in construction state.
  
   --
   Yago Riveiro
   Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
   On Sunday, October 6, 2013 at 12:23 AM, Yago Riveiro wrote:
  
I can attach the full log of the process if you want.
   
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
On Sunday, October 6, 2013 at 12:12 AM, Yago Riveiro wrote:
   
 The error in log are:

 ERROR - 2013-10-05 21:06:22.997;
 org.apache.solr.common.SolrException;
   org.apache.solr.common.SolrException: splitshard the collection time
   out:300s
 ERROR - 2013-10-05 21:06:22.997;
 org.apache.solr.common.SolrException;
   
  
   null:org.apache.solr.common.SolrException: splitshard the collection
 time
   out:300s


 INFO - 2013-10-05 22:48:54.083;
   org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection
   Processor: Message id:/overseer/collection-queue-work/qn-000138
   complete,
  
 response:{success={null={responseHeader={status=0,QTime=1901},core=statistics-13_shard17_0_replica1},null={responseHeader={status=0,QTime=1903},core=statistics-13_shard17_1_replica1},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=6324147}},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_1_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_0_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=1127},core=statistics-13_shard17_0_replica2},null={responseHeader={status=0,QTime=2109},core=statistics-13_shard17_1_replica2}},failure={null=org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I
   was asked to wait on state active for 192.168.
   20.105:8983_solr but I still do not see the requested state. I see
 state:
   recovering live:true},Operation splitshard caused
   exception:=org.apache.solr.common.SolrException: SPLTSHARD failed to
 create
   subshard replicas or timed out waiting for them to come
   up,exception={msg=SPLTSHARD failed to create subshard replicas or
 timed out
   waiting for them to come up,rspCode=500}}


 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Saturday, October 5, 2013 at 5:03 PM, Yago Riveiro wrote:

  I don't have the log, the rotation log file is configured to
 only 5
   files with a small size, I will reconfigured to a high value and retry
 the
   split again.
 
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Saturday, October 5, 2013 at 4:54 PM, Shalin Shekhar Mangar
 wrote:
 
   On Sat, Oct 5, 2013 at 8:37 PM, Yago Riveiro 
   yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote:
  
How I can see the logs of the parent?
   
They are stored on solr.log?
  
   Yes.
  
   --
   Regards,
   Shalin Shekhar Mangar.
  
 

   
  
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 





-- 
Regards,
Shalin Shekhar Mangar.


Re: Searching on (hyphenated/capitalized) word issue

2013-10-09 Thread Furkan KAMACI
If you have that word to index: multicad and if you want to get result
when you search that: multi you can use ngram filter. However you should
consider pros and cons of using Ngram Filter. If you use ngrams you may
find multicad from multi but your index size will be much more bigger.

I suggest you to look at here:
http://docs.lucidworks.com/display/solr/Tokenizers



2013/10/9 Van Tassell, Kristian kristian.vantass...@siemens.com

 Thank you Upayavira.

 I'm trying to figure out what will make Solr stem on multi in the word
 multicad so that any attempt to search on multicad, Multi-CAD or
 multiCAD will return results. The WordDelimiterFilterFactory helps with
 the case of multi followed by a dash or a capital letter, but I'm not sure
 how to get Solr to tokenize the word multi. Should I look at ngram
 configurations? Or is there a filter which promotes (rather than protects)
 words from being stemmed? (in other words, I could configure in a txt file
 that multi should be stemmed.

 Just to reiterate, I am not getting any results when I search for the word
 multicad, even though it appears many times in the text as multiCAD and
 Multi-CAD.

 Here is my configuration:

 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_en.txt enablePositionIncrements=true/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer

 -Original Message-
 From: Upayavira [mailto:u...@odoko.co.uk]
 Sent: Monday, September 30, 2013 1:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Searching on (hyphenated/capitalized) word issue

 You need to look at your analysis chain. The stuff you're talking about
 there is all configurable.

 There's different tokenisers available to split your fields differently,
 then you might use the WordDelimiterFilterFactory to split existing tokens
 further (e.g. WiFi might become wi, fi and WiFi). So really, you need
 to craft your own analysis chain to fit the kind of data you are working
 with.

 Upayavira

 On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote:
  I have a search term multi-CAD being issues on tokenized text.  The
  problem is that you cannot get any search results when you type
  multicad unless you add a hyphen (multi-cad) or type multiCAD
  (omitting the hyphen, but correctly adding the CAPS into the spelling).
 
 
 
  However, for the similar but unhyphenated word AutoCAD, you can type
  autocad and get hits for AutoCAD, as you would expect. You can type
  auto-cad and get the same results.
 
  The query seems to get parsed as separate words (resulting in hits)
  for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for
 multicad.
  In other words, the search terms  become multi cad and auto cad
  for all cases except for when the term is multicad.
 
  I'm guessing this may be in part to auto being a more common word
  prefix, but I may be wrong. Can anyone provide some clarity (and maybe
  point me towards a potential solution)?
 
  Thanks in advance!
 
 
  Kristian Van Tassell
  Siemens Industry Sector
  Siemens Product Lifecycle Management Software Inc.
  5939 Rice Creek Parkway
  Shoreview, MN  55126 United States
  Tel.  :+1 (651) 855-6194
  Fax  :+1 (651) 855-6280
  kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20
  www.siemens.com/plm
 



Update existing documents when using ExtractingRequestHandler?

2013-10-09 Thread Jeroen Steggink
Hi,

In a content management system I have a document and an attachment. The 
document contains the meta data and the attachment the actual data.
I would like to combine data of both in one Solr document.

I have thought of several options:

1. Using ExtractingRequestHandler I would extract the data (extractOnly) 
and combine it with the meta data and send it to Solr.
 But this might be inefficient and increase the network traffic.
2. Seperate Tika installation and use that to extract and send the data 
to Solr.
 This would stress an already busy web server.
3. First upload the file using ExtractingRequestHandler, then use atomic 
updates to add the other fields.

Or is there another way? First add the meta data and later use the 
ExtractingRequestHandler to add the file contents?

Cheers,
Jeroen

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: run filter queries after post filter

2013-10-09 Thread Rohit Harchandani
Hey,
so the post filter logs the number of ids that it receives.
With the above filter having cost=200, the post filter should have received
the same number of ids as before ( when the filter was not present ).
But that does not seem to be the case...with the filter query on the index,
the number of ids that the post filter is receiving reduces.

Thanks,
Rohit


On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, seems like it should. What's our evidence that it isn't working?

 Best,
 Erick

 On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hey,
  I am using solr 4.0 with my own PostFilter implementation which is
 executed
  after the normal solr query is done. This filter has a cost of 100. Is it
  possible to run filter queries on the index after the execution of the
 post
  filter?
  I tried adding the below line to the url but it did not seem to work:
  fq={!cache=false cost=200}field:value
  Thanks,
  Rohit



Re: Find documents that are composed of % words

2013-10-09 Thread shahzad73
Please help me formulate the query that will be easy or do i have to build a
custom filter for this ?

Shahzad 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094372.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Find documents that are composed of % words

2013-10-09 Thread shahzad73
my client has a strange requirement,   he will give a list of 500 words and
then set a percentage like 80%   now he want to find those pages or
documents which consist of the only those 80% of 500   and only 20% unknown.
like   we have this document   

 word1 word2 word3 word4   

and he give the list  word1 word2 word3 and set the accuracy to 75% 
the above doc will meet the criteria because no 1 it matches all words   and
only 25% words are unknow from the list of searches. 

here is another way to say this   if 500 words are provided in search then
All 500 words words must exist in the document  and unknow words should be
only 20%  if accracy is 80%





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094369.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to share Schema between multicore on Solr 4.4

2013-10-09 Thread Shawn Heisey

On 10/9/2013 6:24 AM, Erick Erickson wrote:

Hmmm, I hadn't thought about that before. The shareSchema
stuff is keyed off the absolute directory (and timestamp) of
the schema.xml file associated with a core and is about
sharing the internal object that holds the parsed schema.

Do you know for sure if the fact that this is coming from ZK
actually shares the schema object? 'Cause I've never
looked to see and it would be a good thing to have in my
head...


With SolrCloud, I have no idea whether the actual internal objects are 
shared.  Just now I tried to figure that out from the code, but I don't 
already have an understanding of how that code works, and a quick glance 
isn't enough to gain that knowledge.I can guarantee that you have a much 
deeper understanding of those internals than I do!


My comments were to indicate that SolrCloud creates a situation where 
the config/schema are shared in the sense that there's only one 
canonical copy.


Thanks,
Shawn



Solr's Filtering approaches

2013-10-09 Thread David Philip
Hi All,

I have an issue in handling filters for one of our requirements and
liked to get suggestion  for the best approaches.


*Use Case:*

1.  We have List of groups and the number of groups can increase upto 1
million. Currently we have almost 90 thousand groups in the solr search
system.

2.  Just before the user hits a search, He has options to select the no. of
 groups he want to retrieve. [the distinct list of these group Names for
display are retrieved from other solr index that has more information about
groups]

*3.User Operation:** *
Say if user selected group 1A  - group 1A.  and searches for key:cancer.


The current approach I was thinking is : get search results and filter
query by groupids' list selected by user. But my concern is When these
groups list is increasing to 50k unique Ids, This can cause lot of delay
in getting search results. So wanted to know whether there are different
 filtering ways that I can try for?

I was thinking of one more approach as suggested by my colleague to do -
 intersection.  -
Get the groupIds' selected by user.
Get the list of groupId's from search results,
Perform intersection of both and then get the entire result set of only
those groupid that intersected. Is this better way? Can I use any cache
technique in this case?


- David.


Re: Multiple schemas in the same SolrCloud ?

2013-10-09 Thread Shawn Heisey

On 10/9/2013 1:17 AM, xinwu wrote:

I remember I must put the
-Dbootstrap_confdir=/opt/Solr_home/collection1/conf
-Dcollection.configName=solrConfig  in the catalina.sh .

Is it means that solrCloud must have one ,and only one, schema?


Those bootstrap options are intended to be used ONCE, and on only one of 
your Solr instances, not all of them.  What that does is take the 
configuration in the confdir and upload it to zookeeper, giving it the 
name you chose.


You can have many configurations with different names in zookeeper. Each 
collection is associated with a config name.  A better way than the 
bootstrap options is the zkcli script in cloud-scripts on the example.  
The upconfig command can be used to upload or change your configurations.


http://wiki.apache.org/solr/SolrCloud#Command_Line_Util

Thanks,
Shawn



Re: SolrJ best pratices

2013-10-09 Thread Mark
Thanks for the clarification.

In Solr Cloud just use 1 connection. In non-cloud environments you will need 
one per core.



On Oct 8, 2013, at 5:58 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/7/2013 3:08 PM, Mark wrote:
 Some specific questions:
 - When working with HttpSolrServer should we keep around instances for ever 
 or should we create a singleton that can/should be used over and over?
 - Is there a way to change the collection after creating the server or do we 
 need to create a new server for each collection?
 
 If at all possible, you should create your server object and use it for the 
 life of your application.  SolrJ is threadsafe.  If there is any part of it 
 that's not, the javadocs should say so - the SolrServer implementations 
 definitely are.
 
 By using the word collection you are implying that you are using SolrCloud 
 ... but earlier you said HttpSolrServer, which implies that you are NOT using 
 SolrCloud.
 
 With HttpSolrServer, your base URL includes the core or collection name - 
 http://server:port/solr/corename; for example.  Generally you will need one 
 object for each core/collection, and another object for server-level things 
 like CoreAdmin.
 
 With SolrCloud, you should be using CloudSolrServer instead, another 
 implementation of SolrServer that is constantly aware of the SolrCloud 
 clusterstate.  With that object, you can use setDefaultCollection, and you 
 can also add a collection parameter to each SolrQuery or other request 
 object.
 
 Thanks,
 Shawn
 



Re: Find documents that are composed of % words

2013-10-09 Thread Furkan KAMACI
Are you asking something like that:
http://wiki.apache.org/solr/TextProfileSignature


9 Ekim 2013 Çarşamba tarihinde shahzad73 shahzad...@yahoo.com adlı
kullanıcı şöyle yazdı:
 Please help me formulate the query that will be easy or do i have to
build a
 custom filter for this ?

 Shahzad



 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094372.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: limiting deep pagination

2013-10-09 Thread Michael Sokolov

On 10/8/13 6:51 PM, Peter Keegan wrote:

Is there a way to configure Solr 'defaults/appends/invariants' such that
the product of the 'start' and 'rows' parameters doesn't exceed a given
value? This would be to prevent deep pagination.  Or would this require a
custom requestHandler?

Peter

Just wondering -- isn't it the sum that you should be concerned about 
rather than the product?  Actually I think what we usually do is limit 
both independently, with slightly different concerns, since. eg start=1, 
rows=1000 causes memory problems if you have large fields in your 
results, where start=1000, rows=1 may not actually be a problem


-Mike


Re: SolrCloud High Availability during indexing operation

2013-10-09 Thread Saurabh Saxena
@Furkan Pastebin link is working for me. Can you try again ?


On Wed, Oct 9, 2013 at 1:15 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Hi Saurabh,
 Your link does not work (it is broken).


 2013/10/9 Saurabh Saxena ssax...@gopivotal.com

  Pastbin link http://pastebin.com/cnkXhz7A
 
  I am doing a bulk request. I am uploading 100 files, each file having 100
  docs.
 
  -Saurabh
 
 
  On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
   The attachment did not go through - try using pastebin.com or
 something.
  
   Are you adding docs with curl one at a time or in bulk per request.
  
   - Mark
  
   On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com
  wrote:
  
Repeated the experiments on local system. Single shard Solrcloud
 with a
   replica. Tried to index 10K docs. All the indexing operation were
   redirected to replica Solr node. While the document while getting
 indexed
   on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
   docs got indexed. If I repeat the experiment without shutting down the
   leader instance, all 10K docs get indexed. I am using curl to upload
 the
   docs, there was no curl error while uploading documents.
   
Following error was there in replica log file.
   
ERROR - 2013-10-08 16:10:32.662;
 org.apache.solr.common.SolrException;
   org.apache.solr.common.SolrException: No registered leader was found,
   collection:test_collection slice:shard1
   
Attached replica log file.
   
   
On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena 
 ssax...@gopivotal.com
  
   wrote:
Sorry for the late reply.
   
All the documents have unique id. If I repeat the experiment, the num
  of
   docs indexed changes (I guess it depends when I shutdown a particular
   shard). When I do the experiment without shutting down leader Shards,
 all
   80k docs get indexed (which I think proves that all documents are
 valid).
   
I need to dig the logs to find error message. Also, I am not tracking
  of
   curl return code, will run again and reply.
   
Regards,
Saurabh
   
   
On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
And do any of the documents have the same uniqueKey, which
is usually called id? Subsequent adds of docs with the same
uniqueKey replace the earlier one.
   
It's not definitive because it changes as merges happen, old copies
of docs that have been deleted or updated will be purged, but what
does your admin page show for maxDoc? If it's more than numDocs
then you have duplicate uniqueKeys. NOTE: if you optimize
(which you usually shouldn't) then maxDoc and numDocs will be
the same so if you test this don't optimize.
   
Best,
Erick
   
   
On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
wun...@wunderwood.org wrote:
 Did all of the curl update commands return success? Ane errors in
 the
   logs?

 wunder

 On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:

 Is it possible that some of those 80K docs were simply not valid?
  e.g.
 had a wrong field, had a missing required field, anything like
 that?
 What happens if you clear this collection and just re-run the same
 indexing process and do everything else the same?  Still some docs
 missing?  Same number?

 And what if you take 1 document that you know is valid and index
 it
 80K times, with a different ID, of course?  Do you see 80K docs in
  the
 end?

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena 
   ssax...@gopivotal.com wrote:
 Doc count did not change after I restarted the nodes. I am doing
 a
   single
 commit after all 80k docs. Using Solr 4.4.

 Regards,
 Saurabh


 On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Interesting. Did the doc count change after you started the
 nodes
   again?
 Can you tell us about commits?
 Which version? 4.5 will be out soon.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Sep 23, 2013 8:37 PM, Saurabh Saxena 
 ssax...@gopivotal.com
   wrote:

 Hello,

 I am testing High Availability feature of SolrCloud. I am using
  the
 following setup

 - 8 linux hosts
 - 8 Shards
 - 1 leader, 1 replica / host
 - Using Curl for update operation

 I tried to index 80K documents on replicas (10K/replica in
   parallel).
 During indexing process, I stopped 4 Leader nodes. Once
 indexing
   is done,
 out of 80K docs only 79808 docs are indexed.

 Is this an expected behaviour ? In my opinion replica should
 take
   care of
 indexing if leader is down.

 If this is an expected behaviour, any steps that 

Re: {soft}Commit and cache flusing

2013-10-09 Thread Tim Vaillancourt
Apologies all. I think the suggestion that I was replying to get noticed
is what erked me, otherwise I would have moved on. I'll follow this advice.

Cheers,

Tim


On 9 October 2013 05:20, Erick Erickson erickerick...@gmail.com wrote:

 Tim:

 I think you're mis-interpreting. By replying to a post with the subject:

 {soft}Commit and cache flushing

 but going in a different direction, it's easy for people to think I'm
 not interested in that
 thread, I'll ignore it, thereby missing the fact that you're asking a
 somewhat different
 question that they might have information about. It's not about whether
 you're
 doing anything particularly wrong with the question. It's about making
 it easy for
 people to help.

 See http://people.apache.org/~hossman/#threadhijack

 Best,
 Erick

 On Tue, Oct 8, 2013 at 6:23 PM, Tim Vaillancourt t...@elementspace.com
 wrote:
  I have a genuine question with substance here. If anything this
  nonconstructive, rude response was to get noticed. Thanks for
  contributing to the discussion.
 
  Tim
 
 
  On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote:
 
  Tim,
  I suggest you open a new thread and not reply to this one to get
 noticed.
  Dmitry
 
 
  On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com
  wrote:
 
   Is there a way to make autoCommit only commit if there are pending
  changes,
   ie: if there are 0 adds pending commit, don't autoCommit
 (open-a-searcher
   and wipe the caches)?
  
   Cheers,
  
   Tim
  
  
   On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote:
  
right. We've got the autoHard commit configured only atm. The
   soft-commits
are controlled on the client. It was just easier to implement the
 first
version of our internal commit policy that will commit to all solr
instances at once. This is where we have noticed the reported
 behavior.
   
   
On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu
   wrote:
   
 if there are no modifications to an index and a softCommit or
   hardCommit
 issued, then solr flushes the cache.


 Indeed. The easiest way to work around this is by disabling auto
   commits
 and only commit when you have to.

   
  
 



matching starts with only

2013-10-09 Thread adm1n
My index contains documents which could be a single word or a short sentence
which contains up to 4-5 words. I need to return documents, which starts
with only from the searched pattern.
in regex it would be [^my_query].

for example, for a docs:

black
beautiful black cat
cat
cat is black
black cat

and for the query: black

only black and black cat should be returned.

The text field I'm using is as follows:
fieldType name=text_general_aa class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=15 side=front/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=15 side=front/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
Solr version is 4.2

thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching starts with only

2013-10-09 Thread Shawn Heisey

On 10/9/2013 12:57 PM, adm1n wrote:

My index contains documents which could be a single word or a short sentence
which contains up to 4-5 words. I need to return documents, which starts
with only from the searched pattern.
in regex it would be [^my_query].

for example, for a docs:

black
beautiful black cat
cat
cat is black
black cat

and for the query: black

only black and black cat should be returned.

The text field I'm using is as follows:
fieldType name=text_general_aa class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=15 side=front/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=15 side=front/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
Solr version is 4.2

thanks!


The presence of either the whitespace tokenizer or the NGram filter make 
this impossible, because they both break the indexed value into smaller 
pieces.  Together, they *really* break things up.  Matching is done on a 
per-term basis, and these two components in your analysis chain ensure 
that black will be a term for all of those input documents, whether it 
appears at the beginning, middle, or end.


If you set up a copyField to a new field whose fieldType uses the 
Keyword tokenizer (which treats the entire string as a single token) and 
the lowercase filter, you would be able use the regex support in Solr 
4.x and have this as your query string:


newfield:/^black/

Thanks,
Shawn



Re: run filter queries after post filter

2013-10-09 Thread Erick Erickson
Ah, I think you're misunderstanding the nature of post-filters.
Or I'm confused, which happens a lot!

The whole point of post filters is that they're assumed to be
expensive (think ACL calculation). So you want them to run
on the fewest documents possible. So only docs that make it
through the primary query _and_ all lower-cost filters will get
to this post-filter. This means they can't be cached for
instance, because they don't see (hopefully) very many docs.

This is radically different than normal fq clauses, which are
calculated on the entire corpus and can thus be cached.

Best,
Erick

On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com wrote:
 Hey,
 so the post filter logs the number of ids that it receives.
 With the above filter having cost=200, the post filter should have received
 the same number of ids as before ( when the filter was not present ).
 But that does not seem to be the case...with the filter query on the index,
 the number of ids that the post filter is receiving reduces.

 Thanks,
 Rohit


 On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, seems like it should. What's our evidence that it isn't working?

 Best,
 Erick

 On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hey,
  I am using solr 4.0 with my own PostFilter implementation which is
 executed
  after the normal solr query is done. This filter has a cost of 100. Is it
  possible to run filter queries on the index after the execution of the
 post
  filter?
  I tried adding the below line to the url but it did not seem to work:
  fq={!cache=false cost=200}field:value
  Thanks,
  Rohit



Re: matching starts with only

2013-10-09 Thread adm1n
Shawn Heisey-4:

thanks for the quick response.

Why this field have to be copyField? Couldn't it be a single field, for
example:
fieldType name=text_general_long class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

field name=my_name type=text_general_long stored=true
multiValued=false required=false/



thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to share Schema between multicore on Solr 4.4

2013-10-09 Thread Erick Erickson
bq: ...in the sense that there's only one canonical copy.

Agreed, and as you say that copy is kept in ZooKeeper.

And I pretty much guarantee that the internal solrconfig object
is NOT shared. I doubt the schema object is shared, but it seems
like it could be with some work.

But the savings potential here is rather small unless you have a
large number of cores. The LotsOfCores option is really, at this
point, orthogonal to SolrCloud, I don't think (and we have some
anecdotal evidence) that they don't play nice together

Erick

On Wed, Oct 9, 2013 at 12:17 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/9/2013 6:24 AM, Erick Erickson wrote:

 Hmmm, I hadn't thought about that before. The shareSchema
 stuff is keyed off the absolute directory (and timestamp) of
 the schema.xml file associated with a core and is about
 sharing the internal object that holds the parsed schema.

 Do you know for sure if the fact that this is coming from ZK
 actually shares the schema object? 'Cause I've never
 looked to see and it would be a good thing to have in my
 head...


 With SolrCloud, I have no idea whether the actual internal objects are
 shared.  Just now I tried to figure that out from the code, but I don't
 already have an understanding of how that code works, and a quick glance
 isn't enough to gain that knowledge.I can guarantee that you have a much
 deeper understanding of those internals than I do!

 My comments were to indicate that SolrCloud creates a situation where the
 config/schema are shared in the sense that there's only one canonical copy.

 Thanks,
 Shawn



Re: matching starts with only

2013-10-09 Thread Shawn Heisey

On 10/9/2013 2:16 PM, adm1n wrote:

Why this field have to be copyField? Couldn't it be a single field, for


I always assume that people already are using the existing field and 
type for other purposes.  Offering advice without making that assumption 
will usually result in people making a change and then complaining that 
something else no longer works.


If you don't need what you already have for something else, then you 
could change the type on the existing field with no problem.


Thanks,
Shawn



Re: Searching on (hyphenated/capitalized) word issue

2013-10-09 Thread Erick Erickson
The admin/analysis page is definitely your friend. On the
surface, [catenateWords=1] in WDFF should mash the
split up bits of multiCAD into multicad and you should be.

I suspect that StandardTokenizerFactory is somehow getting
into the mix here. Under any circumstance, the admin/analysis
page should help.

StandardTokenizerFactory, on a quick test, does split up
multi-cad into separate tokens that then do NOT get
concatenated...

That doesn't explain not getting hits on multiCAD though when
you search for multicad.

Best,
Erick


On Wed, Oct 9, 2013 at 10:45 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 If you have that word to index: multicad and if you want to get result
 when you search that: multi you can use ngram filter. However you should
 consider pros and cons of using Ngram Filter. If you use ngrams you may
 find multicad from multi but your index size will be much more bigger.

 I suggest you to look at here:
 http://docs.lucidworks.com/display/solr/Tokenizers



 2013/10/9 Van Tassell, Kristian kristian.vantass...@siemens.com

 Thank you Upayavira.

 I'm trying to figure out what will make Solr stem on multi in the word
 multicad so that any attempt to search on multicad, Multi-CAD or
 multiCAD will return results. The WordDelimiterFilterFactory helps with
 the case of multi followed by a dash or a capital letter, but I'm not sure
 how to get Solr to tokenize the word multi. Should I look at ngram
 configurations? Or is there a filter which promotes (rather than protects)
 words from being stemmed? (in other words, I could configure in a txt file
 that multi should be stemmed.

 Just to reiterate, I am not getting any results when I search for the word
 multicad, even though it appears many times in the text as multiCAD and
 Multi-CAD.

 Here is my configuration:

 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_en.txt enablePositionIncrements=true/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
   /analyzer

 -Original Message-
 From: Upayavira [mailto:u...@odoko.co.uk]
 Sent: Monday, September 30, 2013 1:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Searching on (hyphenated/capitalized) word issue

 You need to look at your analysis chain. The stuff you're talking about
 there is all configurable.

 There's different tokenisers available to split your fields differently,
 then you might use the WordDelimiterFilterFactory to split existing tokens
 further (e.g. WiFi might become wi, fi and WiFi). So really, you need
 to craft your own analysis chain to fit the kind of data you are working
 with.

 Upayavira

 On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote:
  I have a search term multi-CAD being issues on tokenized text.  The
  problem is that you cannot get any search results when you type
  multicad unless you add a hyphen (multi-cad) or type multiCAD
  (omitting the hyphen, but correctly adding the CAPS into the spelling).
 
 
 
  However, for the similar but unhyphenated word AutoCAD, you can type
  autocad and get hits for AutoCAD, as you would expect. You can type
  auto-cad and get the same results.
 
  The query seems to get parsed as separate words (resulting in hits)
  for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for
 multicad.
  In other words, the search terms  become multi cad and auto cad
  for all cases except for when the term is multicad.
 
  I'm guessing this may be in part to auto being a more common word
  prefix, but I may be wrong. Can anyone provide some clarity (and maybe
  point me towards a potential solution)?
 
  Thanks in advance!
 
 
  Kristian Van Tassell
  Siemens Industry Sector
  Siemens Product Lifecycle Management Software Inc.
  5939 Rice Creek Parkway
  Shoreview, MN  55126 United States
  Tel.  :+1 (651) 855-6194
  Fax  :+1 (651) 855-6280
  kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20
  www.siemens.com/plm
 



Re: matching starts with only

2013-10-09 Thread adm1n
search by starts with is something new I have to add, as well as the data I
have to index for this purpose, so it's ok to create a new field.

But once I added the following field type:
fieldType name=text_general_long class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

And:
field name=my_name type=text_general_long stored=true
multiValued=false required=false/
indexing, and afterwards searching by my_name:/^black/ returns no results,
while searching by my_name:black returns only black document.

What am I missing?

thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094453.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: run filter queries after post filter

2013-10-09 Thread Rohit Harchandani
yes i get that. actually i should have explained in more detail.

- i have a query which gets certain documents.
- the post filter gets these matched documents and does some processing on
them and filters the results.
- but after this is done i need to apply another filter - which is why i
gave a higher cost to it.

the reason i need to do this is because the processing done by the post
filter depends on the documents matching the query till that point.
since the normal fq clause is also getting executed before the post filter
(despite the cost), the final results are not accurate

thanks
Rohit




On Wed, Oct 9, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.comwrote:

 Ah, I think you're misunderstanding the nature of post-filters.
 Or I'm confused, which happens a lot!

 The whole point of post filters is that they're assumed to be
 expensive (think ACL calculation). So you want them to run
 on the fewest documents possible. So only docs that make it
 through the primary query _and_ all lower-cost filters will get
 to this post-filter. This means they can't be cached for
 instance, because they don't see (hopefully) very many docs.

 This is radically different than normal fq clauses, which are
 calculated on the entire corpus and can thus be cached.

 Best,
 Erick

 On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hey,
  so the post filter logs the number of ids that it receives.
  With the above filter having cost=200, the post filter should have
 received
  the same number of ids as before ( when the filter was not present ).
  But that does not seem to be the case...with the filter query on the
 index,
  the number of ids that the post filter is receiving reduces.
 
  Thanks,
  Rohit
 
 
  On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Hmmm, seems like it should. What's our evidence that it isn't working?
 
  Best,
  Erick
 
  On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com
  wrote:
   Hey,
   I am using solr 4.0 with my own PostFilter implementation which is
  executed
   after the normal solr query is done. This filter has a cost of 100.
 Is it
   possible to run filter queries on the index after the execution of the
  post
   filter?
   I tried adding the below line to the url but it did not seem to work:
   fq={!cache=false cost=200}field:value
   Thanks,
   Rohit
 



Re: Searching on (hyphenated/capitalized) word issue

2013-10-09 Thread Upayavira
It depends whether multicad is a special case, or whether you want micr
to match the term microsoft.

If it is a special case, you can use synonyms, so that multi and
multicad are considered the same term.

If it isn't a special case, then ngrams could work - your document would
be indexed with:

mul
mult
multi
multic
multica
multicad

all indexed at the same term position, allowing for any of those to
match. Of course, that will make your index much larger.

As Erick says, use the admin/analysis page to play with your analysis
chains and see what they do to different inputs.

Upayavira

On Wed, Oct 9, 2013, at 09:30 PM, Erick Erickson wrote:
 The admin/analysis page is definitely your friend. On the
 surface, [catenateWords=1] in WDFF should mash the
 split up bits of multiCAD into multicad and you should be.
 
 I suspect that StandardTokenizerFactory is somehow getting
 into the mix here. Under any circumstance, the admin/analysis
 page should help.
 
 StandardTokenizerFactory, on a quick test, does split up
 multi-cad into separate tokens that then do NOT get
 concatenated...
 
 That doesn't explain not getting hits on multiCAD though when
 you search for multicad.
 
 Best,
 Erick
 
 
 On Wed, Oct 9, 2013 at 10:45 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  If you have that word to index: multicad and if you want to get result
  when you search that: multi you can use ngram filter. However you should
  consider pros and cons of using Ngram Filter. If you use ngrams you may
  find multicad from multi but your index size will be much more bigger.
 
  I suggest you to look at here:
  http://docs.lucidworks.com/display/solr/Tokenizers
 
 
 
  2013/10/9 Van Tassell, Kristian kristian.vantass...@siemens.com
 
  Thank you Upayavira.
 
  I'm trying to figure out what will make Solr stem on multi in the word
  multicad so that any attempt to search on multicad, Multi-CAD or
  multiCAD will return results. The WordDelimiterFilterFactory helps with
  the case of multi followed by a dash or a capital letter, but I'm not sure
  how to get Solr to tokenize the word multi. Should I look at ngram
  configurations? Or is there a filter which promotes (rather than protects)
  words from being stemmed? (in other words, I could configure in a txt file
  that multi should be stemmed.
 
  Just to reiterate, I am not getting any results when I search for the word
  multicad, even though it appears many times in the text as multiCAD and
  Multi-CAD.
 
  Here is my configuration:
 
  analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords_en.txt enablePositionIncrements=true/
  filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/
/analyzer
 
  -Original Message-
  From: Upayavira [mailto:u...@odoko.co.uk]
  Sent: Monday, September 30, 2013 1:45 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Searching on (hyphenated/capitalized) word issue
 
  You need to look at your analysis chain. The stuff you're talking about
  there is all configurable.
 
  There's different tokenisers available to split your fields differently,
  then you might use the WordDelimiterFilterFactory to split existing tokens
  further (e.g. WiFi might become wi, fi and WiFi). So really, you need
  to craft your own analysis chain to fit the kind of data you are working
  with.
 
  Upayavira
 
  On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote:
   I have a search term multi-CAD being issues on tokenized text.  The
   problem is that you cannot get any search results when you type
   multicad unless you add a hyphen (multi-cad) or type multiCAD
   (omitting the hyphen, but correctly adding the CAPS into the spelling).
  
  
  
   However, for the similar but unhyphenated word AutoCAD, you can type
   autocad and get hits for AutoCAD, as you would expect. You can type
   auto-cad and get the same results.
  
   The query seems to get parsed as separate words (resulting in hits)
   for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for
  multicad.
   In other words, the search terms  become multi cad and auto cad
   for all cases except for when the term is multicad.
  
   I'm guessing this may be in part to auto being a more common word
   prefix, but I may be wrong. Can anyone provide some clarity (and maybe
   point me towards a potential solution)?
  
   Thanks in advance!
  
  
   Kristian Van Tassell
   Siemens Industry Sector
   Siemens Product Lifecycle Management Software Inc.
   5939 Rice Creek Parkway
   Shoreview, MN  55126 

Dynamically loading synonym dictionary for solr SynonymFilter

2013-10-09 Thread ALEX PKB
Hi,
All of our synonyms are maintained in DB, we would like to fetch those
synonym dynamically for query expansion (Not indexing time). Are there any
code contribution?
I saw some discussion years ago but without conclusion.
Thanks a lot!


Re: run filter queries after post filter

2013-10-09 Thread jim ferenczi
Hi Rohit,
The main problem is that if the query inside the filter does not have a
PostFilter implementation then your post filter is silently transformed
into a simple filter. The query field:value is based on the inverted
lists and does not have a postfilter support.
If your field is a numeric field take a look at the frange query parser
which has post filter support:
To filter out document with a field value less than 5:
fq={!frange l=5 cache=false cost=200}field(myField)

Cheers,
Jim


2013/10/9 Rohit Harchandani rhar...@gmail.com

 yes i get that. actually i should have explained in more detail.

 - i have a query which gets certain documents.
 - the post filter gets these matched documents and does some processing on
 them and filters the results.
 - but after this is done i need to apply another filter - which is why i
 gave a higher cost to it.

 the reason i need to do this is because the processing done by the post
 filter depends on the documents matching the query till that point.
 since the normal fq clause is also getting executed before the post filter
 (despite the cost), the final results are not accurate

 thanks
 Rohit




 On Wed, Oct 9, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Ah, I think you're misunderstanding the nature of post-filters.
  Or I'm confused, which happens a lot!
 
  The whole point of post filters is that they're assumed to be
  expensive (think ACL calculation). So you want them to run
  on the fewest documents possible. So only docs that make it
  through the primary query _and_ all lower-cost filters will get
  to this post-filter. This means they can't be cached for
  instance, because they don't see (hopefully) very many docs.
 
  This is radically different than normal fq clauses, which are
  calculated on the entire corpus and can thus be cached.
 
  Best,
  Erick
 
  On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com
  wrote:
   Hey,
   so the post filter logs the number of ids that it receives.
   With the above filter having cost=200, the post filter should have
  received
   the same number of ids as before ( when the filter was not present ).
   But that does not seem to be the case...with the filter query on the
  index,
   the number of ids that the post filter is receiving reduces.
  
   Thanks,
   Rohit
  
  
   On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Hmmm, seems like it should. What's our evidence that it isn't working?
  
   Best,
   Erick
  
   On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com
   wrote:
Hey,
I am using solr 4.0 with my own PostFilter implementation which is
   executed
after the normal solr query is done. This filter has a cost of 100.
  Is it
possible to run filter queries on the index after the execution of
 the
   post
filter?
I tried adding the below line to the url but it did not seem to
 work:
fq={!cache=false cost=200}field:value
Thanks,
Rohit
  
 



Re: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID

2013-10-09 Thread Otis Gospodnetic
Bharat,

Can you look at the logs on the Master when you issue the delete and
the subsequent commits and share that?

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Oct 8, 2013 at 3:57 PM, Akkinepalli, Bharat (ELS-CON)
b.akkinepa...@elsevier.com wrote:
 Hi,
 We have recently migrated from Solr 3.6 to Solr 4.4.  We are using the 
 Master/Slave configuration in Solr 4.4 (not Solr Cloud).  We have noticed the 
 following behavior/defect.

 Configuration:
 ===

 1.   The Hard Commit and Soft Commit are disabled in the configuration 
 (we control the commits from the application)

 2.   We have 1 Master and 2 Slaves configured and the pollInterval is 
 configured to 10 Minutes.

 3.   The Master is configured to have the replicateAfter as commit  
 startup

 Steps to reproduce the problem:
 ==

 1.   Delete a document in Solr  (using delete by id).  URL - 
 http://localhost:8983/solr/annotation/update with body as  
 deleteidchange.me/id/delete

 2.   Issue a commit in Master 
 (http://localhost:8983/solr/annotation/update?commit=true).

 3.   The replication of the DELETE WILL NOT happen.  The master and slave 
 has the same Index version.

 4.   If we try to issue another commit in Master, we see that it 
 replicates fine.

 Request you to please confirm if this is a known issue.  Thank you.

 Regards,
 Bharat Akkinepalli



Re: Dynamically loading synonym dictionary for solr SynonymFilter

2013-10-09 Thread Jan Høydahl
Hi,

Not as I know of. You'd probably want to subclass SynonymFilter* with your own 
DB aware implementation, and of course contribute this back :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

9. okt. 2013 kl. 23:31 skrev ALEX PKB alex...@gmail.com:

 Hi,
 All of our synonyms are maintained in DB, we would like to fetch those
 synonym dynamically for query expansion (Not indexing time). Are there any
 code contribution?
 I saw some discussion years ago but without conclusion.
 Thanks a lot!



Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread deniz
hi all,

I have encountered some problems and post it on stackoverflow here:
http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false
 

as you can see from the response, does it make sense to open a bug ticket
for this? because, although i can workaround this by setting everything back
to stored=true, it does not make sense to keep every field stored while i
dont need to return them in the search result.. or will anyone can make more
detailed explanations that this is expected and normal? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread Bill Bell
You have to update the whole record including all fields...

Bill Bell
Sent from mobile


 On Oct 9, 2013, at 7:50 PM, deniz denizdurmu...@gmail.com wrote:
 
 hi all,
 
 I have encountered some problems and post it on stackoverflow here:
 http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false
  
 
 as you can see from the response, does it make sense to open a bug ticket
 for this? because, although i can workaround this by setting everything back
 to stored=true, it does not make sense to keep every field stored while i
 dont need to return them in the search result.. or will anyone can make more
 detailed explanations that this is expected and normal? 
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread deniz
Billnbell wrote
 You have to update the whole record including all fields...

so what is the point of having atomic updates if i need to update
everything? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508p4094523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread Shawn Heisey
On 10/9/2013 8:39 PM, deniz wrote:
 Billnbell wrote
 You have to update the whole record including all fields...
 
 so what is the point of having atomic updates if i need to update
 everything? 

If you have any regular fields that are not stored, atomic updates will
not work -- unstored field data will be lost.  If you have copyField
destination fields that *are* stored, atomic updates will not work as
expected with those fields.  The wiki spells out the requirements:

http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations

An atomic update is just a shortcut for read all existing fields from
the original document, apply the atomic updates, and re-insert the
document, overwriting the original.

Thanks,
Shawn