date:20130917

Problem with SynonymFilter and StopFilterFactory

2013-09-17 Thread david . davila

Hi, 

I have encoutered a problem applying StopFilterFactory and 
SynonimFilterFactory. The problem is that SynonymFilter removes the gaps 
that were previously put by the StopFilterFactory. I'm applying filters in 
query time, because users need to change synonym lists frequently.

This is my schema, and an example of the issue:


String: documentacion para agentes

org.apache.solr.analysis.WhitespaceTokenizerFactory 
{luceneMatchVersion=LUCENE_35}
position1   2   3
term text   documentaciónpara   agentes
startOffset 0   14  19
endOffset   13  18  26
org.apache.solr.analysis.LowerCaseFilterFactory 
{luceneMatchVersion=LUCENE_35}
position1   2   3
term text   documentaciónpara   agentes
startOffset 0   14  19
endOffset   13  18  26
org.apache.solr.analysis.StopFilterFactory {words=stopwords_intranet.txt, 
ignoreCase=true, enablePositionIncrements=true, 
luceneMatchVersion=LUCENE_35}
position1   3
term text   documentación   agentes
startOffset 0   19
endOffset   13  26
org.apache.solr.analysis.SynonymFilterFactory 
{synonyms=sinonimos_intranet.txt, expand=true, ignoreCase=true, 
luceneMatchVersion=LUCENE_35}
position1   2
term text   documentación   agente
archivo agentes
typeSYNONYM SYNONYM
SYNONYM SYNONYM
startOffset 0   19
0   19
endOffset 1326
13  26


As you can see, the position should be 1 and 3, but SynonymFilter removes 
the gap and moves token from position 3 to 2
I've got the same problem with Solr 3.5 y 4.0. 
I don't know if it's a bug or an error with my configuration. In other 
schemas that I have worked with, I had always put the SynonymFilter 
previous to StopFilter, but in this I prefered using this order because of 
the big number of synonym that the list has (i.e. I don't want to generate 
a lot of synonyms for a word that I really wanted to remove).

Thanks,

David Dávila Atienza
AEAT - Departamento de Informática Tributaria

Re: how to make sure all the index docs flushed to the index files

2013-09-17 Thread YouPeng Yang

Hi
   Another werid problem.
   When we setup the autocommit properties, we  suppose that the index
fille will created every commited.So that the size of the index files will
be large enough. We do not want to keep too many small files as [1].

   How to control the size of the index files.

[1]
...omited 
548KBindex/_28w_Lucene41_0.doc
289KBindex/_28w_Lucene41_0.pos
1.1Mindex/_28w_Lucene41_0.tim
24Kindex/_28w_Lucene41_0.tip
2.1Mindex/_28w.fdt
766Bindex/_28w.fdx
5KBindex/_28w.fnm
40Kindex/_28w.nvd
79Kindex/_28w.nvm
364Bindex/_28w.si
518KBindex/_28x_Lucene41_0.doc
290KBindex/_28x_Lucene41_0.pos
1.2Mindex/_28x_Lucene41_0.tim
28Kindex/_28x_Lucene41_0.tip
2.1Mindex/_28x.fdt
843Bindex/_28x.fdx
5KBindex/_28x.fnm
40Kindex/_28x.nvd
79Kindex/_28x.nvm
386Bindex/_28x.si
...omited 
-





2013/9/17 YouPeng Yang yypvsxf19870...@gmail.com

 Hi  Shawn

Thank your very much for your reponse.

I lauch the full-import task on the web page of solr/admin . And I do
 check the commit option.
 The new docs would be committed after the operation.
   The commit option is defferent with the autocommit,right? If the import
 datasets are too large that leads to poor performance or
 other problems ,such as [1].

The exception that indicate that -Too many open files-,we thought is
 because of the ulimit.





 [1]
 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149d.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149e.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149f.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149g.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149h.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149i.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149j.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149k.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149l.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149m.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149n.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149o.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149p.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149q.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149r.fdx (Too many 
 open files)

 java.io.FileNotFoundException: 
 /data/apache-tomcat/webapps/solr/collection1/data/index/_149s.fdx (Too many 
 open files)



 2013/9/17 Shawn Heisey s...@elyograg.org

 On 9/16/2013 8:26 PM, YouPeng Yang wrote:
 I'm using  the DIH to import data from  oracle database with Solr4.4
 Finally I get 2.7GB index data and 4.1GB tlog data.And the number of
  docs was 1090.
 
At first,  I move the 2.7GB index data to another new Solr Server in
  tomcat7. After I start the tomcat ,I find the total number of docs was
 just
  half of the orginal number.
So I thought that maybe the left docs were not commited to index
  files,and the  tlog needed to be replayed .

 You need to turn on autoCommit in your solrconfig.xml so that there are
 hard commits happening on a regular basis that flush all indexed data to
 disk and start new transaction log files.  I will give you a link with
 some information about that below.

Sequently , I moved the 2.7GB index data and 4.1GB tlog data to the
 new
  Solr Server in tomcat7.
 After I start the tomcat,an exception comes up as [1].
 Then it halts.I can not access the tomcat server URL.
  I noticed  that  the CPU utilization  was high by using the comand:
 top
  -d 1 | grep tomcatPid.
  I thought solr was replaying the updatelog.And I wait a long time and it
  still was replaying. As results ,I give up.

 I don't know what the exception was about, but it is likely that it WAS
 replaying the log.  With 4.1GB

few and huge tlogs

2013-09-17 Thread YouPeng Yang

Hi
  According to
http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup。
  It explains that the tlog file will swith to a new when hard commit
happened.


  However,my tlog shows different.
tlog.003   5.16GB
tlog.004   1.56GB
tlog.002   610.MB

  there are only a fewer tlogs which suppose to be ten files, and each one
is vary huge.Even there are lots of hard commit happened.

 So why the number of the tlog files does not increase ?


  here are settings of the  DirectUpdateHandler2:
 updateHandler class=solr.DirectUpdateHandler2

  updateLog
  str name=dir${solr.ulog.dir:}/str
   /updateLog

  autoCommit
   maxTime120/maxTime
   maxDocs100/maxDocs
   openSearcherfalse/openSearcher
 /autoCommit


  autoSoftCommit
 maxTime60/maxTime
   maxDocs50/maxDocs
   /autoSoftCommit

/updateHandler

Re: how to make sure all the index docs flushed to the index files

2013-09-17 Thread Shawn Heisey

On 9/17/2013 12:32 AM, YouPeng Yang wrote:
 Hi
Another werid problem.
When we setup the autocommit properties, we  suppose that the index
 fille will created every commited.So that the size of the index files will
 be large enough. We do not want to keep too many small files as [1].
 
How to control the size of the index files.

An index segment gets created after every hard commit.   In the listing
that you sent, all the files starting with _28w are a single segment.
All the files starting with _28x are another segment.

Solr should be merging the segments when you get enough of them, unless
you have incorrectly set up your merge policy.  The default number of
segments that get merged is ten.  When you get ten segments, they will
be merged down to one.  This repeats until you have ten merged segments.
 At that point, those ten merged segments will be merged to make an even
larger segment.

You can bump up the number of open files allowed by your operating
system.  On Linux, this is controlled by the /etc/security/limits.conf
file.  Here are some example config lines for that file:

elyograghardnofile  6144
elyogragsoftnofile  4096
roothardnofile  6144
rootsoftnofile  4096

Alternatively, you can reduce the required number of files if you turn
on the UseCompoundFile setting, which is in the IndexConfig section.
This causes Solr to create a single file per index segment instead of
several files per segment.  The compound file may be slightly less
efficient, but the difference is likely to be very small.

https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig

Problem indexing windows files

2013-09-17 Thread Yossi Nachum

Hi,

I am trying to index my windows pc files with manifoldcf version 1.3 and
solr version 4.4.

I create output connection and repository connection and started a new job
that scan my E drive.

Everything seems like it work ok but after a few minutes solr stop getting
new files to index. I am seeing that through tomcat log file.

On manifold crawler ui I see that the job is still running but after few
minutes I am getting the following error:
Error: Repeated service interruptions - failure processing document: Read
timed out

I am seeing that tomcat process is constantly consume 100% of one cpu (I
have two cpu's) even after I get the error message from manifolfcf crawler
ui.

I check the thread dump in solr admin and saw that the following threads
take the most cpu/user time

http-8080-3 (32)

   - java.io.FileInputStream.readBytes(Native Method)
   - java.io.FileInputStream.read(FileInputStream.java:236)
   - java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   - java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   - java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
   - java.io.FilterInputStream.read(FilterInputStream.java:133)
   - org.apache.tika.io.TailStream.read(TailStream.java:117)
   - org.apache.tika.io.TailStream.skip(TailStream.java:140)
   - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
   - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
   -
   org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
   - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
   - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   -
   org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
   -
   
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
   -
   
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   -
   
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   -
   
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
   - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
   -
   
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   -
   
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   -
   
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   -
   
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   -
   org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   -
   org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   -
   
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   -
   org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
   -
   org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
   -
   
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
   - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
   - java.lang.Thread.run(Thread.java:679)



does anyone know what can I do? how to debug this issue? how can I check
which file cause tika to work so hard?
I don't see anything in the log files and I am stuck
Thanks,
Yossi

Scoring by document size

2013-09-17 Thread blopez

Hi all,

I have some doubts about the Solr scoring function. I'm using all default
configuration, but I'm facing a wired issue with the retrieved scores.

In the schema, I'm going to focus in the only field I'm interested in. Its
definition is:

*fieldType name=text class=solr.TextField sortMissingLast=true
omitNorms=false
analyzer type=index
tokenizer 
class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
/analyzer
analyzer type=query
tokenizer 
class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
/analyzer
/fieldType

field name=myField type=text indexed=true stored=true
required=false /*

(omitNorms=false, if not, the document size is not taken into account to
the final score)

Then, I index some documents, with the following text in the 'myField'
field:

doc1 = A B C
doc2 = A B C D
doc3 = A B C D E
doc4 = A B C D E F
doc5 = A B C D E F G H
doc6 = A B C D E F G H I

Finally, I perform the query 'myField:(A B C)' in order to recover all
the documents, but with different scoring (doc1 is more similar to the query
than doc2, which is more similar than doc3, ...).

All the documents are retrieved (OK), but the scores are like this:

*doc1 = 2,590214
doc2 = 2,590214*
doc3 = 2,266437
*doc4 = 1,94266
doc5 = 1,94266*
doc6 = 1,618884

So in conclussion, as you can see the score goes down, but not the way I'd
like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches 3/3
tokens, and Doc2 matches 3/4 tokens.

Is this the normal Solr behaviour? Is there any way to get my expected
behaviour?

Thanks a lot,
Borja.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dih delete doc per $deleteDocById

2013-09-17 Thread Shalin Shekhar Mangar

What is your question?

On Tue, Sep 17, 2013 at 12:17 AM, andreas owen a.o...@gmx.net wrote:
 i am using dih and want to delete indexed documents by xml-file with ids. i 
 have seen $deleteDocById used in entity query=...

 data-config.xml:
 entity name=rec processor=XPathEntityProcessor 
 url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportDelete.xml
  forEach=/docs/doc dataSource=main 
 field column=$deleteDocById xpath=//id /
 /entity

 xml-file:
 docs
 doc
 id2345/id
 /doc
 /docs



-- 
Regards,
Shalin Shekhar Mangar.

Re: Re-Ranking results based on DocValues with custom function.

2013-09-17 Thread Mathias Lux

Hi!

Thanks for the directions! I got it up and running with a custom
ValueSourceParser: http://pastebin.com/cz1rJn4A and a custom
ValueSource: http://pastebin.com/j8mhA8e0

It basically allows for searching for text (which is associated to an
image) in an index and then getting the distance to a sample image
(base64 encoded byte[] array) based on one of five different low level
content based features stored as DocValues.

A sample result is here: http://pastebin.com/V7kL3DJh

So there one little tiny question I still have ;) When I'm trying to
do a sort I'm getting

msg: sort param could not be parsed as a query, and is not a field
that exists in the index:
lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=),

for the call 
http://localhost:9000/solr/lire/select?q=*%3A*sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+ascfl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)wt=jsonindent=true

cheers,
  Mathias

On Tue, Sep 17, 2013 at 1:01 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 : dissimilarity functions). What I want to do is to search using common
 : text search and then (optionally) re-rank using some custom function
 : like
 :
 : http://localhost:8983/solr/select?q=*:*sort=myCustomFunction(var1) asc

 can you describe what you want your custom function to look like? it may
 already be possible using the existing functions provided out of hte box -
 just neeed to combine them to build up the mathc expression...

 https://wiki.apache.org/solr/FunctionQuery

 ...if you really want to write your own, just implement ValueSourceParser
 and register it in solrconfig.xml...

 https://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

 : I've seen that there are hooks in solrconfig.xml, but I did not find
 : an example or some documentation. I'd be most grateful if anyone could
 : either point me to one or give me a hint for another way to go :)

 when writing a custom plugin like this, the best thing to do is look at
 the existing examples of that plugin.  almost all of hte built in
 ValueSourceParsers are really trivial, and can be found in tiny anonymous
 classes right inside the ValueSourceParser.java...

 For example, the function ot divide the results of two other fnctions...

 addParser(div, new ValueSourceParser() {
   @Override
   public ValueSource parse(FunctionQParser fp) throws SyntaxError {
 ValueSource a = fp.parseValueSource();
 ValueSource b = fp.parseValueSource();
 return new DivFloatFunction(a, b);
   }
 });

 ..or, if you were trying to bundle that up in your own plugin jar and
 register it in solrconfig.xml, you might write it something like...

 public class DivideValueSourceParser extends ValueSourceParser {
   public DivideValueSourceParser() { }
   public ValueSource parse(FunctionQParser fp) throws SyntaxError {
 ValueSource a = fp.parseValueSource();
 ValueSource b = fp.parseValueSource();
 return new DivFloatFunction(a, b);
   }
 }

 and then register it as...

 valueSourceParser name=div class=com.you.DivideValueSourceParser /


 depending on your needs, you may also want to write a custom ValueSource
 implementation (ie: instead of DivFloatFunction above) in which case,
 again, the best examples to look at are all of the existing ValueSource
 functions...

 https://lucene.apache.org/core/4_4_0/queries/org/apache/lucene/queries/function/ValueSource.html


 -Hoss



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

Re: Scoring by document size

2013-09-17 Thread Upayavira

Have you used debugQuery=true, or fl=*,[explain], or those various
functions? It is possible to ask Solr to tell you how it calculated the
score, which will enable you to see what is going on in each case. You
can probably work it out for yourself then I suspect.

Upayavira

On Tue, Sep 17, 2013, at 08:40 AM, blopez wrote:
 Hi all,
 
 I have some doubts about the Solr scoring function. I'm using all default
 configuration, but I'm facing a wired issue with the retrieved scores.
 
 In the schema, I'm going to focus in the only field I'm interested in.
 Its
 definition is:
 
 *fieldType name=text class=solr.TextField sortMissingLast=true
 omitNorms=false
   analyzer type=index
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/ 
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.ASCIIFoldingFilterFactory/
   /analyzer
   analyzer type=query
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/ 
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.ASCIIFoldingFilterFactory/
   /analyzer
 /fieldType
 
 field name=myField type=text indexed=true stored=true
 required=false /*
 
 (omitNorms=false, if not, the document size is not taken into account
 to
 the final score)
 
 Then, I index some documents, with the following text in the 'myField'
 field:
 
 doc1 = A B C
 doc2 = A B C D
 doc3 = A B C D E
 doc4 = A B C D E F
 doc5 = A B C D E F G H
 doc6 = A B C D E F G H I
 
 Finally, I perform the query 'myField:(A B C)' in order to recover
 all
 the documents, but with different scoring (doc1 is more similar to the
 query
 than doc2, which is more similar than doc3, ...).
 
 All the documents are retrieved (OK), but the scores are like this:
 
 *doc1 = 2,590214
 doc2 = 2,590214*
 doc3 = 2,266437
 *doc4 = 1,94266
 doc5 = 1,94266*
 doc6 = 1,618884
 
 So in conclussion, as you can see the score goes down, but not the way
 I'd
 like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches
 3/3
 tokens, and Doc2 matches 3/4 tokens.
 
 Is this the normal Solr behaviour? Is there any way to get my expected
 behaviour?
 
 Thanks a lot,
 Borja.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scoring by document size

2013-09-17 Thread Mathias Lux

As the IDF values for A, B and C are minimal (couldn't get any worse
than being in any document), the major part of your score comes most
likely from the coord(..) part of scoring - which basically computes
the overlap of the query and the document. If you want to have a
stronger influence you can extend and override the Similarity
implementation. You might take a look at
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

cheers,
  Mathias

On Tue, Sep 17, 2013 at 1:59 PM, Upayavira u...@odoko.co.uk wrote:
 Have you used debugQuery=true, or fl=*,[explain], or those various
 functions? It is possible to ask Solr to tell you how it calculated the
 score, which will enable you to see what is going on in each case. You
 can probably work it out for yourself then I suspect.

 Upayavira

 On Tue, Sep 17, 2013, at 08:40 AM, blopez wrote:
 Hi all,

 I have some doubts about the Solr scoring function. I'm using all default
 configuration, but I'm facing a wired issue with the retrieved scores.

 In the schema, I'm going to focus in the only field I'm interested in.
 Its
 definition is:

 *fieldType name=text class=solr.TextField sortMissingLast=true
 omitNorms=false
   analyzer type=index
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter 
 class=solr.ASCIIFoldingFilterFactory/
   /analyzer
   analyzer type=query
   tokenizer 
 class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter 
 class=solr.ASCIIFoldingFilterFactory/
   /analyzer
 /fieldType

 field name=myField type=text indexed=true stored=true
 required=false /*

 (omitNorms=false, if not, the document size is not taken into account
 to
 the final score)

 Then, I index some documents, with the following text in the 'myField'
 field:

 doc1 = A B C
 doc2 = A B C D
 doc3 = A B C D E
 doc4 = A B C D E F
 doc5 = A B C D E F G H
 doc6 = A B C D E F G H I

 Finally, I perform the query 'myField:(A B C)' in order to recover
 all
 the documents, but with different scoring (doc1 is more similar to the
 query
 than doc2, which is more similar than doc3, ...).

 All the documents are retrieved (OK), but the scores are like this:

 *doc1 = 2,590214
 doc2 = 2,590214*
 doc3 = 2,266437
 *doc4 = 1,94266
 doc5 = 1,94266*
 doc6 = 1,618884

 So in conclussion, as you can see the score goes down, but not the way
 I'd
 like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches
 3/3
 tokens, and Doc2 matches 3/4 tokens.

 Is this the normal Solr behaviour? Is there any way to get my expected
 behaviour?

 Thanks a lot,
 Borja.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

Re: how soft-commit works

2013-09-17 Thread Erick Erickson

Here's a rather long blog post I wrote up that might help:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Mon, Sep 16, 2013 at 1:43 PM, Shawn Heisey s...@elyograg.org wrote:

 On 9/16/2013 7:01 AM, Matteo Grolla wrote:
  Can anyone explain me the following things about soft-commit?
  -For searches o access new documents I think a new searcher is opened
 after a soft commit.
How does the near realtime requirement for soft commit match with
 the potentially long time taken to warm up caches for the new searcher?
  -Is it a good idea to set
openSearcher=false in auto commit
and rely on soft auto commit to see new data in searches?

 That is a very common way for installs requiring NRT updates to get
 configured.

 NRTCachingDirectoryFactory, which is the directory class used in the
 example since 4.0, is a wrapper around MMapDirectoryFactory, which is
 the old default in 3.x.

 For soft commits, the NRT directory keeps small commits in RAM rather
 than writing it to the disk, which makes the process of opening a new
 searcher happen a lot faster.


 http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/store/NRTCachingDirectory.html

 If your index rate is very fast or you index large amounts of data, the
 NRT directory doesn't gain you much over MMap, but because we made it
 the default in the example, it probably doesn't have any performance
 detriment.

 Thanks,
 Shawn

Re: Dynamic row sizing for documents via UpdateCSV

2013-09-17 Thread Erick Erickson

Well, it's reasonably easy if you have empty columns, in the same
order, for _all_ of the possible dynamic fields, but I really doubt
you are that fortunate... It's especially ugly in that you have the
different dynamic fields scattered around.

How is the csv file generated? Could you force every row to have
_all_ the possible columns in the same order with spaces or something
in the columns that are empty?

Otherwise I'd think about parsing them externally and using, say, SolrJ
to transmit the individual records to Solr.

Best,
Erick


On Mon, Sep 16, 2013 at 2:47 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 Hello,

 I am using UpdateCSV to load data in solr.

 Currently I load this schema with a static set of values:
 userid,name,age,location
 john8322,John,32,CA
 tom22,Tom,30,NY


 But now I have this usecase where john8322 might have a state specific
 dynamic field for example:
 userid,name,age,location, ca_count_i
 john8322,John,32,CA, 7

 And tom22 might have different dynamic fields:
 userid,name,age,location, ny_count_i,oh_count_i
 tom22,Tom,30,NY, 981,11

 So is it possible to pass different columns sizes for each row, something
 like this:
 john8322,John,32,CA,ca_count_i:7
 tom22,Tom,30,NY, ny_count_i:981,oh_count_i:11

 I understand that the above syntax is not possible, but is there any other
 way of solving this problem?

 --
 Thanks,
 -Utkarsh

Re: dih delete doc per $deleteDocById

2013-09-17 Thread Andreas Owen

i would like to know how to get it to work and delete documents per xml and dih.

On 17. Sep 2013, at 1:47 PM, Shalin Shekhar Mangar wrote:

 What is your question?
 
 On Tue, Sep 17, 2013 at 12:17 AM, andreas owen a.o...@gmx.net wrote:
 i am using dih and want to delete indexed documents by xml-file with ids. i 
 have seen $deleteDocById used in entity query=...
 
 data-config.xml:
 entity name=rec processor=XPathEntityProcessor 
 url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportDelete.xml
  forEach=/docs/doc dataSource=main 
field column=$deleteDocById xpath=//id /
 /entity
 
 xml-file:
 docs
doc
id2345/id
/doc
 /docs
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

Re: Atomic commit across shards?

2013-09-17 Thread Erick Erickson

There are two things to think about here.
1 if you're issuing the commit manually (i.e. not relying on the settings
in
solrconfig.xml) then they are atomic. The call doesn't return until all the
active nodes have seen the commit.

2 However, autocommits are usually time based. Since servers start
up at different times, if you're relying on the the settings in
solrconfig.xml
to do the commits then there will be slight offsets since the timers will
expire
at slightly different times.

Best,
Erick


On Mon, Sep 16, 2013 at 6:44 PM, Damien Dykman damien.dyk...@gmail.comwrote:

 Is a commit (hard or soft) atomic across shards?

 In other words, can I guaranty that any given search on a multi-shard
 collection will hit the same index generation of each shard?

 Thanks,
 Damien

Re: few and huge tlogs

2013-09-17 Thread Erick Erickson

Probably because you're indexing a lot of documents
very quickly. It's entirely reasonable to have
much shorter autoCommit times, all that does is
1 truncate the transaction log
2 close the current segment
3 start a new segment.

That should cut down your tlog files drastically. Try
setting your autocommit time to, say, 15000 (15 seconds).

Long blog here:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Tue, Sep 17, 2013 at 5:16 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote:

 Hi
   According to
 http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup。
   It explains that the tlog file will swith to a new when hard commit
 happened.


   However,my tlog shows different.
 tlog.003   5.16GB
 tlog.004   1.56GB
 tlog.002   610.MB

   there are only a fewer tlogs which suppose to be ten files, and each one
 is vary huge.Even there are lots of hard commit happened.

  So why the number of the tlog files does not increase ?


   here are settings of the  DirectUpdateHandler2:
  updateHandler class=solr.DirectUpdateHandler2

   updateLog
   str name=dir${solr.ulog.dir:}/str
/updateLog

   autoCommit
maxTime120/maxTime
maxDocs100/maxDocs
openSearcherfalse/openSearcher
  /autoCommit


   autoSoftCommit
  maxTime60/maxTime
maxDocs50/maxDocs
/autoSoftCommit

 /updateHandler

Re: how to make sure all the index docs flushed to the index files

2013-09-17 Thread Erick Erickson

Here's a blog about tlogs and commits:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

And here's Mike's excellent segment merging blog
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick

On Tue, Sep 17, 2013 at 6:36 AM, Shawn Heisey s...@elyograg.org wrote:

On 9/17/2013 12:32 AM, YouPeng Yang wrote:
Hi
Another werid problem.
When we setup the autocommit properties, we suppose that the index
fille will created every commited.So that the size of the index files
will
be large enough. We do not want to keep too many small files as [1].

How to control the size of the index files.

An index segment gets created after every hard commit. In the listing
that you sent, all the files starting with _28w are a single segment.
All the files starting with _28x are another segment.

Solr should be merging the segments when you get enough of them, unless
you have incorrectly set up your merge policy. The default number of
segments that get merged is ten. When you get ten segments, they will
be merged down to one. This repeats until you have ten merged segments.
At that point, those ten merged segments will be merged to make an even
larger segment.

You can bump up the number of open files allowed by your operating
system. On Linux, this is controlled by the /etc/security/limits.conf
file. Here are some example config lines for that file:

elyograghardnofile 6144
elyogragsoftnofile 4096
roothardnofile 6144
rootsoftnofile 4096

Alternatively, you can reduce the required number of files if you turn
on the UseCompoundFile setting, which is in the IndexConfig section.
This causes Solr to create a single file per index segment instead of
several files per segment. The compound file may be slightly less
efficient, but the difference is likely to be very small.

https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig

Re: Scoring by document size

2013-09-17 Thread Erick Erickson

This kind of artificial test is almost always misleading.
Some approximations are used, in particular the
length of the field is not stored as an exact number,
so at various points some fields with slightly different
lengths are rounded to the same number, thus the
identical scores you're seeing.

Unless you have a compelling reason, I wouldn't
spend too much time trying to adjust scores in this
kind of situation, if your real data exhibits behavior
you need to change it's a different story of course.

Best,
Erick


On Tue, Sep 17, 2013 at 3:40 AM, blopez balo...@hotmail.com wrote:

 Hi all,

 I have some doubts about the Solr scoring function. I'm using all default
 configuration, but I'm facing a wired issue with the retrieved scores.

 In the schema, I'm going to focus in the only field I'm interested in. Its
 definition is:

 *fieldType name=text class=solr.TextField sortMissingLast=true
 omitNorms=false
 analyzer type=index
 tokenizer
 class=solr.WhitespaceTokenizerFactory/
 filter
 class=solr.LowerCaseFilterFactory/
 filter
 class=solr.ASCIIFoldingFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer
 class=solr.WhitespaceTokenizerFactory/
 filter
 class=solr.LowerCaseFilterFactory/
 filter
 class=solr.ASCIIFoldingFilterFactory/
 /analyzer
 /fieldType

 field name=myField type=text indexed=true stored=true
 required=false /*

 (omitNorms=false, if not, the document size is not taken into account to
 the final score)

 Then, I index some documents, with the following text in the 'myField'
 field:

 doc1 = A B C
 doc2 = A B C D
 doc3 = A B C D E
 doc4 = A B C D E F
 doc5 = A B C D E F G H
 doc6 = A B C D E F G H I

 Finally, I perform the query 'myField:(A B C)' in order to recover
 all
 the documents, but with different scoring (doc1 is more similar to the
 query
 than doc2, which is more similar than doc3, ...).

 All the documents are retrieved (OK), but the scores are like this:

 *doc1 = 2,590214
 doc2 = 2,590214*
 doc3 = 2,266437
 *doc4 = 1,94266
 doc5 = 1,94266*
 doc6 = 1,618884

 So in conclussion, as you can see the score goes down, but not the way I'd
 like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches
 3/3
 tokens, and Doc2 matches 3/4 tokens.

 Is this the normal Solr behaviour? Is there any way to get my expected
 behaviour?

 Thanks a lot,
 Borja.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to round solr score ?

2013-09-17 Thread Mamta Thakur

Hi ,

As per this post here 
http://grokbase.com/t/lucene/solr-user/131jzcg3q2/how-to-round-solr-score.
I was able to use my custom fn in 
sort(defType=funcq=socialDegree(id,1)fl=score,*sort=score%20asc) - works,
but can't facet on the 
same(defType=funcq=socialDegree(id,1)fl=score,*facet=truefacet.field=score) 
- doesn't work.

Exception:
org.apache.solr.common.SolrException: undefined field: score
at org.apache.solr.schema.IndexSchema.getField(IndexSchema.java:965)
at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:294)
at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:423)
at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)
at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)

Is there any way by which we can achieve this?

Thanks,
Mamta.




This email is intended for the person(s) to whom it is addressed and may 
contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, 
distribution, copying, or disclosure by any person other than the addressee(s) 
is strictly prohibited. If you have received this email in error, please notify 
the sender immediately by return email and delete the message and any 
attachments from your system.

Re: spellcheck causing Core Reload to hang

2013-09-17 Thread Raheel Hasan

I think they should have it in RC0, because if you search in this forum at
lucene, this issue is there since version 4.3 !

Regards,
Raheel


On Tue, Sep 17, 2013 at 5:58 PM, Erick Erickson erickerick...@gmail.comwrote:

 H, do we have a JIRA tracking this and does it seem like any fix will
 get into 4.5?

 I think 4.5 RC0 will be cut tomorrow (Wednesday)

 Best,
 Erick


 On Tue, Sep 17, 2013 at 3:04 AM, Raheel Hasan raheelhasan@gmail.com
 wrote:

  I think there is another solution:
 
  Just hide this entry in solrconfig str
  name=spellcheck.maxCollationTries/str
 
  and instead, pass it in the actual query string that calls your
  requestHandler (like
  /select/?q=spellcheck.maxCollationTries=3...)
 
 
 
  On Mon, Sep 16, 2013 at 9:37 PM, Jeroen Steggink jer...@stegg-inc.com
  wrote:
 
   Hi James,
  
   I already had the
  
   spellcheck.**collateExtendedResults=true
  
   Adding
  
   spellcheck.**collateMaxCollectDocs=0
  
   did the trick.
  
   Thanks so much.
  
   Jeroen
  
   On 16-9-2013 18:16, Dyer, James wrote:
  
   If this started with Solr4.4, I would suspect
  https://issues.apache.org/*
   *jira/browse/SOLR-3240 
 https://issues.apache.org/jira/browse/SOLR-3240
  .
  
   Rather than removing spellcheck parameters, can you try
 adding/changing
   spellcheck.**collateMaxCollectDocs=0 and
  spellcheck.**collateExtendedResults=true
   ?  These two settings effectively disable the optimization made with
   SOLR-3240.
  
   James Dyer
   Ingram Content Group
   (615) 213-4311
  
  
 
  --
  Regards,
  Raheel Hasan
 




-- 
Regards,
Raheel Hasan

Re: spellcheck causing Core Reload to hang

2013-09-17 Thread Raheel Hasan

Check this thread:
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-td3192748i20.htmlhttp://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-td3192748i20.html#a4090320
This issue is there since 2011.



On Tue, Sep 17, 2013 at 6:35 PM, Raheel Hasan raheelhasan@gmail.comwrote:

 I think they should have it in RC0, because if you search in this forum at
 lucene, this issue is there since version 4.3 !

 Regards,
 Raheel


 On Tue, Sep 17, 2013 at 5:58 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 H, do we have a JIRA tracking this and does it seem like any fix will
 get into 4.5?

 I think 4.5 RC0 will be cut tomorrow (Wednesday)

 Best,
 Erick


 On Tue, Sep 17, 2013 at 3:04 AM, Raheel Hasan raheelhasan@gmail.com
 wrote:

  I think there is another solution:
 
  Just hide this entry in solrconfig str
  name=spellcheck.maxCollationTries/str
 
  and instead, pass it in the actual query string that calls your
  requestHandler (like
  /select/?q=spellcheck.maxCollationTries=3...)
 
 
 
  On Mon, Sep 16, 2013 at 9:37 PM, Jeroen Steggink jer...@stegg-inc.com
  wrote:
 
   Hi James,
  
   I already had the
  
   spellcheck.**collateExtendedResults=true
  
   Adding
  
   spellcheck.**collateMaxCollectDocs=0
  
   did the trick.
  
   Thanks so much.
  
   Jeroen
  
   On 16-9-2013 18:16, Dyer, James wrote:
  
   If this started with Solr4.4, I would suspect
  https://issues.apache.org/*
   *jira/browse/SOLR-3240 
 https://issues.apache.org/jira/browse/SOLR-3240
  .
  
   Rather than removing spellcheck parameters, can you try
 adding/changing
   spellcheck.**collateMaxCollectDocs=0 and
  spellcheck.**collateExtendedResults=true
   ?  These two settings effectively disable the optimization made with
   SOLR-3240.
  
   James Dyer
   Ingram Content Group
   (615) 213-4311
  
  
 
  --
  Regards,
  Raheel Hasan
 




 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan

check which file/document cause solr to work hard

2013-09-17 Thread Yossi Nachum

Hi,

I am trying to index my windows pc files with manifoldcf version 1.3 and
solr version 4.4.

Few minutes after I start the crawler job I see that tomcat process
constantly consume 100% of one cpu (I have two cpu's).

I check the thread dump in solr admin and saw that the following threads
take the most cpu/user time

http-8080-3 (32)

   - java.io.FileInputStream.readBytes(Native Method)
   - java.io.FileInputStream.read(FileInputStream.java:236)
   - java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
   - java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
   - java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
   - java.io.FilterInputStream.read(FilterInputStream.java:133)
   - org.apache.tika.io.TailStream.read(TailStream.java:117)
   - org.apache.tika.io.TailStream.skip(TailStream.java:140)
   - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
   - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
   -
   org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
   - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
   - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   -
   org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
   -
   
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
   -
   
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   -
   
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   -
   
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
   - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
   -
   
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
   -
   
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   -
   
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   -
   
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   -
   
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   -
   org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   -
   org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   -
   
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   -
   org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
   -
   org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
   -
   
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
   - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
   - java.lang.Thread.run(Thread.java:679)



how can I check which file cause tika to work so hard?
I don't see anything in the log files and I am stuck
Thanks,
Yossi

tlog after commit

2013-09-17 Thread Alejandro Calbazana

Quick question...  Should I still see tlog files after a hard commit?

I'm trying to test soft commit and hard commits and I was under the
impression that tlog would be removed after a hard commit where, in the
case of soft commits, I would still see them.

Thanks,

Al

Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Sesha Sendhil

Hi,

I am using solr 4.4 in solr cloud configuration. When i try to 'set' a
field in a document using the update request handler, I get a 'missing
required field' error. However, when I send this query to the specific
shard containing the document, the update succeeds.

Is this a bug in solr 4.4 or am I doing something wrong

I started the shards specifying numShards and have checked that the router
used is the compositeId router.
Distributed indexing is done based on ids sharing the same domain/prefix,
i.e. 'customerB!' form and the documents are distributed in the shards
correctly.
Querying for documents works as expected and returns all matching documents
across shards.

Thanks
Sesha

Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Sesha Sendhil Subramanian

Hi,

I am using solr 4.4 in solr cloud configuration. When i try to 'set' a
field in a document using the update request handler, I get a 'missing
required field' error. However, when I send this query to the specific
shard containing the document, the update succeeds.

Is this a bug in solr 4.4 or am I doing something wrong

I started the shards specifying numShards and have checked that the router
used is the compositeId router.
Distributed indexing is done based on ids sharing the same domain/prefix,
i.e. 'customerB!' form and the documents are distributed in the shards
correctly.
Querying for documents works as expected and returns all matching documents
across shards.

Thanks

Sesha

Re: Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Yonik Seeley

On Tue, Sep 17, 2013 at 10:47 AM, Sesha Sendhil Subramanian
seshasend...@indix.com wrote:
 I am using solr 4.4 in solr cloud configuration. When i try to 'set' a
 field in a document using the update request handler, I get a 'missing
 required field' error.

Can you show the exact error message you get, and the update you are
trying to send?

-Yonik
http://lucidworks.com

Re: How to round solr score ?

2013-09-17 Thread Chris Hostetter


: 'score' is a pseudo-field, i.e., it does not actually exist in
: the index, which is probably why it cannot be faceted on.
: Faceting on a rounded score seems like an unusual use
: case. What requirement are you trying to address?

agreed, more details would be helpful.

FWIW: the only way available to facet on functions is to use facet.query 
along with the {!frange} paser to create facet constraints based on ranges 
of function values that you specify.

there is no othe way i can think of to facet over function values -- there 
is an open issue where people were discussing it, but i don't think there 
wa ever a functional patch...

https://issues.apache.org/jira/browse/SOLR-1581






-Hoss

Re: Re-Ranking results based on DocValues with custom function.

2013-09-17 Thread Chris Hostetter


: It basically allows for searching for text (which is associated to an
: image) in an index and then getting the distance to a sample image
: (base64 encoded byte[] array) based on one of five different low level
: content based features stored as DocValues.

very cool.

: So there one little tiny question I still have ;) When I'm trying to
: do a sort I'm getting
: 
: msg: sort param could not be parsed as a query, and is not a field
: that exists in the index:
: lirefunc(cl_hi,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=),
: 
: for the call 
http://localhost:9000/solr/lire/select?q=*%3A*sort=lirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)+ascfl=id%2Ctitle%2Clirefunc(cl_hi%2CFQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA%3D)wt=jsonindent=true

Hmmm...

i think the crux of the issue is your string literal.  function parsing 
tries to make live easy for you by not requiring string literals to be 
quoted unless they conflict with other function names or field names 
etc  on top of that the sort parsing code is kind of hueristic based 
(because it has to account for both functions or field names or wildcards, 
followed by other sort clauses, etc...) so in that context the special 
characters like '=' in your base64 string literal might be confusing hte 
hueristics.

can you try to quote the string literal it and see if that works?

For example, when i try using strdist with your base64 string in a sort 
param using the example configs i get the same error...

http://localhost:8983/solr/select?q=*:*sort=strdist%28name,FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=,jw%29+asc

but if i quote the string literal it works fine...

http://localhost:8983/solr/select?q=*:*sort=strdist%28name,%27FQY5DhMYDg0ODg0PEBEPDg4ODg8QEgsgEBAQEBAgEBAQEBA=%27,jw%29+asc



-Hoss

Re: Solr node goes down while trying to index records

2013-09-17 Thread Furkan KAMACI

Do you get that error only when indexing?


2013/9/17 neoman harira...@gmail.com

 Hello everyone,
 one or more of the nodes in the solrcloud go down randomly when we try to
 index data using solrj APIs. The nodes do recover. but when we try to index
 back, they go down again

 Our configuration:
 3 shards
 Solr 4.4.

 I see the following exceptions in the log file.
 09/17/13
 15:33:32:976|localhost-startStop-1-SendThread(10.68.129.119:9080
 )|INFO|org.apache.zookeeper.ClientCnxn|Socket
 connection established to 10.68.129.119/10.68.129.119:9080, initiating
 session|
 09/17/13
 15:33:32:978|localhost-startStop-1-SendThread(10.68.129.119:9080
 )|INFO|org.apache.zookeeper.ClientCnxn|Unable
 to reconnect to ZooKeeper service, session 0x34109f9474b0029 has expired,
 closing socket connection|
 09/17/13

 15:34:36:080|localhost-startStop-1-EventThread|ERROR|apache.solr.cloud.ZkController|There
 was a problem making a request to the
 leader:org.apache.solr.client.solrj.SolrServerException: Timeout occured
 while waiting response from server at: http://solr02-prod.phneaz:8080/solr
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at

 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1421)
 at

 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:306)
 at
 org.apache.solr.cloud.ZkController.access$100(ZkController.java:86)
 at
 org.apache.solr.cloud.ZkController$1.command(ZkController.java:196)
 at

 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:117)
 at

 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
 at

 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
 at

 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)

 We are also getting IOExcpetion in the client side.
 Adding chunk 122
 Total  Count 12422
 org.apache.solr.client.solrj.SolrServerException: Timeout occured while
 waiting response from server at:
 http://solr-prod.com:8443/solr/aq-collection
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:409)
 at

 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at

 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
 at

 com.billmelater.fraudworkstation.data.DataProvider.flushBatch(DataProvider.java:48)
 at

 com.billmelater.fraudworkstation.data.AQDBDataProvider.execute(AQDBDataProvider.java:114)
 at

 com.billmelater.fraudworkstation.data.AQDBDataProvider.main(AQDBDataProvider.java:244)
 Caused by: java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)

 Your help is appreciated.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610.html
 Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud liveness problems

2013-09-17 Thread Vladimir Veljkovic

Hello there,

we have following setup:

SolrCloud 4.4.0 (3 nodes, physical machines)
Zookeeper 3.4.5 (3 nodes, physical machines)

We have a number of rather small collections (~10K or ~100K of documents), that 
we would like to load to all Solr instances (numShards=1, 
replication_factor=3), and access them through local network interface, as the 
load balancing is done in layers above.

We can live (and we actually do it in the test phase) with updating the entire 
collections whenever we need it, switching collection aliases and removing the 
old collections.

We stumbled across following problem: as soon as all three Solr nodes become a 
leader to at least one collection, restarting any node makes it completely 
unresponsive (timeout), both though admin interface and for replication. If we 
restart all solr nodes the cluster end up in some kind of deadlock and only 
remedy we found is Solr clean installation, removing ZooKeeper data and 
re-posting collections.

Apparently, leader is waiting for replicas to come up and they try to 
synchronize but timeout on http requests, so everything ends up in some kind of 
dead lock, maybe related to:

https://issues.apache.org/jira/browse/SOLR-5240

Eventually (after few minutes), leader takes over, mark collections active 
but remains blocked on http interface, so other nodes can not synchronize.

In further tests, we loaded 4 collections with numShards=1 and 
replication_factor=2. By chance, one node become the leader for all 4 
collections. Restarting the node which was not the leader is done without the 
problem, but when we restarted the leader it happened that:
- leader shut down, other nodes became leaders of 2 collections each
- leader starts up, 3 collections on it become active, one collection remains 
”down” and node becomes unresponsive and timeouts on http requests.

As this behavior is completely unexpected for one cluster solution, I wonder if 
somebody else experienced same problems or we are doing something entirely 
wrong.

Best regards

-- 
 
Vladimir Veljkovic
Senior Java Entwickler

Boxalino AG

vladimir.veljko...@boxalino.com 
www.boxalino.com 


Tuning Kit for your Online Shop

Product Search - Recommendations - Landing Pages - Data intelligence - Mobile 
Commerce

Re: Atomic updates with solr cloud in solr 4.4

2013-09-17 Thread Sesha Sendhil Subramanian

curl http://localhost:8983/solr/search/update -H
'Content-type:application/json' -d '
[
 {
  id:
c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675,
  link_id_45454 : {set:abcdegff}
 }
]'

I have two collections search and meta. I want to do an update in the
search collection.
If i pick a document in same shard : localhost:8983, the update succeeds

15350327 [qtp386373885-19] INFO
 org.apache.solr.update.processor.LogUpdateProcessor  ? [search]
webapp=/solr path=/update params={}
{add=[6cfcb56ca52b56ccb1377a7f0842e74d!6cfcb56ca52b56ccb1377a7f0842e74d
(1446444025873694720)]} 0 5

If i pick a document on a different shard : localhost:7574, the update fails

15438547 [qtp386373885-75] INFO
 org.apache.solr.update.processor.LogUpdateProcessor  ? [search]
webapp=/solr path=/update params={} {} 0 1
15438548 [qtp386373885-75] ERROR org.apache.solr.core.SolrCore  ?
org.apache.solr.common.SolrException:
[doc=c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675]
missing required field: variant_count
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:189)
at
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:392)
at
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:117)
at
org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101)
at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)

Sesha

Solr node goes down while trying to index records

2013-09-17 Thread neoman

Hello everyone,
one or more of the nodes in the solrcloud go down randomly when we try to
index data using solrj APIs. The nodes do recover. but when we try to index
back, they go down again

Our configuration:
3 shards 
Solr 4.4.

I see the following exceptions in the log file.
09/17/13
15:33:32:976|localhost-startStop-1-SendThread(10.68.129.119:9080)|INFO|org.apache.zookeeper.ClientCnxn|Socket
connection established to 10.68.129.119/10.68.129.119:9080, initiating
session|
09/17/13
15:33:32:978|localhost-startStop-1-SendThread(10.68.129.119:9080)|INFO|org.apache.zookeeper.ClientCnxn|Unable
to reconnect to ZooKeeper service, session 0x34109f9474b0029 has expired,
closing socket connection|
09/17/13
15:34:36:080|localhost-startStop-1-EventThread|ERROR|apache.solr.cloud.ZkController|There
was a problem making a request to the
leader:org.apache.solr.client.solrj.SolrServerException: Timeout occured
while waiting response from server at: http://solr02-prod.phneaz:8080/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:431)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1421)
at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:306)
at
org.apache.solr.cloud.ZkController.access$100(ZkController.java:86)
at
org.apache.solr.cloud.ZkController$1.command(ZkController.java:196)
at
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:117)
at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)

We are also getting IOExcpetion in the client side.
Adding chunk 122
Total  Count 12422
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at:
http://solr-prod.com:8443/solr/aq-collection
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:409)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at
com.billmelater.fraudworkstation.data.DataProvider.flushBatch(DataProvider.java:48)
at
com.billmelater.fraudworkstation.data.AQDBDataProvider.execute(AQDBDataProvider.java:114)
at
com.billmelater.fraudworkstation.data.AQDBDataProvider.main(AQDBDataProvider.java:244)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)

Your help is appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr node goes down while trying to index records

2013-09-17 Thread neoman

yes. the nodes go down while indexing. if we stop indexing, it does not go
down.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dynamic row sizing for documents via UpdateCSV

2013-09-17 Thread Utkarsh Sengar

Yeah I think the only way to go about it is via SolrJ. The csv file is
generated by a pig job which computes the data to be loaded in solr.
I think this is what I will endup doing: Load all the possible columns in
the csv with a value of 0 if the value doesn't exist for a specific record.

I was just trying to avoid it and find an optimal solution with UpdateCSV.

Thanks,
-Utkarsh


On Tue, Sep 17, 2013 at 5:43 AM, Erick Erickson erickerick...@gmail.comwrote:

 Well, it's reasonably easy if you have empty columns, in the same
 order, for _all_ of the possible dynamic fields, but I really doubt
 you are that fortunate... It's especially ugly in that you have the
 different dynamic fields scattered around.

 How is the csv file generated? Could you force every row to have
 _all_ the possible columns in the same order with spaces or something
 in the columns that are empty?

 Otherwise I'd think about parsing them externally and using, say, SolrJ
 to transmit the individual records to Solr.

 Best,
 Erick


 On Mon, Sep 16, 2013 at 2:47 PM, Utkarsh Sengar utkarsh2...@gmail.com
 wrote:

  Hello,
 
  I am using UpdateCSV to load data in solr.
 
  Currently I load this schema with a static set of values:
  userid,name,age,location
  john8322,John,32,CA
  tom22,Tom,30,NY
 
 
  But now I have this usecase where john8322 might have a state specific
  dynamic field for example:
  userid,name,age,location, ca_count_i
  john8322,John,32,CA, 7
 
  And tom22 might have different dynamic fields:
  userid,name,age,location, ny_count_i,oh_count_i
  tom22,Tom,30,NY, 981,11
 
  So is it possible to pass different columns sizes for each row, something
  like this:
  john8322,John,32,CA,ca_count_i:7
  tom22,Tom,30,NY, ny_count_i:981,oh_count_i:11
 
  I understand that the above syntax is not possible, but is there any
 other
  way of solving this problem?
 
  --
  Thanks,
  -Utkarsh
 




-- 
Thanks,
-Utkarsh

Re: SolrCloud liveness problems

2013-09-17 Thread Mark Miller


On Sep 17, 2013, at 12:00 PM, Vladimir Veljkovic 
vladimir.veljko...@boxalino.com wrote:

 Hello there,
 
 we have following setup:
 
 SolrCloud 4.4.0 (3 nodes, physical machines)
 Zookeeper 3.4.5 (3 nodes, physical machines)
 
 We have a number of rather small collections (~10K or ~100K of documents), 
 that we would like to load to all Solr instances (numShards=1, 
 replication_factor=3), and access them through local network interface, as 
 the load balancing is done in layers above.
 
 We can live (and we actually do it in the test phase) with updating the 
 entire collections whenever we need it, switching collection aliases and 
 removing the old collections.
 
 We stumbled across following problem: as soon as all three Solr nodes become 
 a leader to at least one collection, restarting any node makes it completely 
 unresponsive (timeout), both though admin interface and for replication. If 
 we restart all solr nodes the cluster end up in some kind of deadlock and 
 only remedy we found is Solr clean installation, removing ZooKeeper data and 
 re-posting collections.
 
 Apparently, leader is waiting for replicas to come up and they try to 
 synchronize but timeout on http requests, so everything ends up in some kind 
 of dead lock, maybe related to:
 
 https://issues.apache.org/jira/browse/SOLR-5240

Yup, that sounds exactly what you would expect with SOLR-5240. A fix for that 
is coming in 4.5, which is a probably a week or so away.

 
 Eventually (after few minutes), leader takes over, mark collections active 
 but remains blocked on http interface, so other nodes can not synchronize.
 
 In further tests, we loaded 4 collections with numShards=1 and 
 replication_factor=2. By chance, one node become the leader for all 4 
 collections. Restarting the node which was not the leader is done without the 
 problem, but when we restarted the leader it happened that:
 - leader shut down, other nodes became leaders of 2 collections each
 - leader starts up, 3 collections on it become active, one collection 
 remains ”down” and node becomes unresponsive and timeouts on http requests.

Hard to say - I'll experiment with 4.5 and see if I can duplicate this.

- Mark

 
 As this behavior is completely unexpected for one cluster solution, I wonder 
 if somebody else experienced same problems or we are doing something entirely 
 wrong.
 
 Best regards
 
 -- 
 
 Vladimir Veljkovic
 Senior Java Entwickler
 
 Boxalino AG
 
 vladimir.veljko...@boxalino.com 
 www.boxalino.com 
 
 
 Tuning Kit for your Online Shop
 
 Product Search - Recommendations - Landing Pages - Data intelligence - Mobile 
 Commerce

Getting a query parameter in a TokenFilter

2013-09-17 Thread Isaac Hebsh

Hi everyone,

We developed a TokenFilter.
It should act differently, depends on a parameter supplied in the
query (for query chain only, not the index one, of course).
We found no way to pass that parameter into the TokenFilter flow. I guess
that the root cause is because TokenFilter is a pure lucene object.

As a last resort, we tried to pass the parameter as the first term in the
query text (q=...), and save it as a member of the TokenFilter instance.

Although it is ugly, it might work fine.
But, the problem is that it is not guaranteed that all the terms of a
particular query will be analyzed by the same instance of a TokenFilter. In
this case, some terms will be analyzed without the required information of
that parameter. We can produce such a race very easily.

How should I overcome this issue?
Do anyone have a better resolution?

Re: Stop zookeeper from batch

2013-09-17 Thread Furkan KAMACI

Are you looking for that:

https://issues.apache.org/jira/browse/ZOOKEEPER-1122

16 Eylül 2013 Pazartesi tarihinde Prasi S prasi1...@gmail.com adlı
kullanıcı şöyle yazdı:
 Hi,
 We have setup solrcloud with zookeeper and 2 tomcats . we are using a
batch
 file to start the zookeeper, uplink config files and start tomcats.

 Now, i need to stop zookeeper from the batch file. How is this possible.

 Im using Windows server. Zookeeper 3.4.5 version.

 Pls help.

 Thanks,
 Prasi

Some text not indexed in solr4.4

2013-09-17 Thread Utkarsh Sengar

I have a copyField called allText with type text_general:
https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68

I have ~100 documents which have the text: dyson and dc44 or dc41 etc.

For example:
title: Dyson DC44 Animal Digital Slim Cordless Vacuum
description: The DC44 Animal is the new Dyson Digital Slim vacuum
cleaner  the cordless machine that doesn’t lose suction. It has been
engineered for floor to ceiling cleaning. DC44 Animal has a detachable
long-reach wand  which is balanced for floor to ceiling cleaning.   The
motorized floor tool has twice the power of the DC35 floor tool  to drive
the bristles deeper into the carpet pile with more force. It attaches to
the wand or directly to the machine for cleaning awkward spaces. The brush
bar has carbon fiber filaments for removing fine dust from hard floors.
DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
manganese cobalt battery and Root Cyclone technology for constant  powerful
suction.,
UPC: 0879957006362

The documents are indexed.

Analysis says its indexeD: http://i.imgur.com/O52ino1.png
But when I search for allText:dyson dc44 I get no results, response:
http://pastie.org/8334220

Any suggestions about the problem? I am out of ideas about how to debug
this.

-- 
Thanks,
-Utkarsh

Re: How to round solr score ?

2013-09-17 Thread Gora Mohanty

On 17 September 2013 18:31, Mamta Thakur mtha...@care.com wrote:

 Hi ,

 As per this post here
 http://grokbase.com/t/lucene/solr-user/131jzcg3q2/how-to-round-solr-score.
 I was able to use my custom fn in
 sort(defType=funcq=socialDegree(id,1)fl=score,*sort=score%20asc) - works,
 but can't facet on the
 same(defType=funcq=socialDegree(id,1)fl=score,*facet=truefacet.field=score)
 - doesn't work.


'score' is a pseudo-field, i.e., it does not actually exist in
the index, which is probably why it cannot be faceted on.
Faceting on a rounded score seems like an unusual use
case. What requirement are you trying to address?

Regards,
Gora

Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-17 Thread Furkan KAMACI

Currently I hafer over 50+ millions documents at my index and as I mentiod
before at another question I have some problems while indexing (jetty EOF
exception) I know that problem may not be about index size but just I want
to learn that is there any limit for document size at Solr that if I exceed
it I can have some problems? I am not talking about the theoretical limit.

What are the maximim index size for folks and what they to handle heavy
index rate when having millions of documents. What tuning strategies they
do?

PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr
4.2.1 for my SolrCloud.

Re: Solr node goes down while trying to index records

2013-09-17 Thread Furkan KAMACI

Could you give some information about your jetty.xml and give more info
about your index rate and RAM usage of your machines?

17 Eylül 2013 Salı tarihinde neoman harira...@gmail.com adlı kullanıcı
şöyle yazdı:
 yes. the nodes go down while indexing. if we stop indexing, it does not go
 down.



 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: tlog after commit

2013-09-17 Thread Furkan KAMACI

Did you check here:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

17 Eylül 2013 Salı tarihinde Alejandro Calbazana acalbaz...@gmail.com
adlı kullanıcı şöyle yazdı:
 Quick question...  Should I still see tlog files after a hard commit?

 I'm trying to test soft commit and hard commits and I was under the
 impression that tlog would be removed after a hard commit where, in the
 case of soft commits, I would still see them.

 Thanks,

 Al

Re: Problem indexing windows files

2013-09-17 Thread Furkan KAMACI

Firstly;

This may not be a Solr related problem. Did you check the log file of Solr?
Tika mayhave some circumstances at some kind of situations. For example
when parsing HTML that has a base64 encoded image it may have some
problems. If you find the correct logs you can detect it. On the other take
care of Manifold, there may be some problem too.

17 Eylül 2013 Salı tarihinde Yossi Nachum nachum...@gmail.com adlı
kullanıcı şöyle yazdı:
 Hi,

 I am trying to index my windows pc files with manifoldcf version 1.3 and
 solr version 4.4.

 I create output connection and repository connection and started a new job
 that scan my E drive.

 Everything seems like it work ok but after a few minutes solr stop getting
 new files to index. I am seeing that through tomcat log file.

 On manifold crawler ui I see that the job is still running but after few
 minutes I am getting the following error:
 Error: Repeated service interruptions - failure processing document: Read
 timed out

 I am seeing that tomcat process is constantly consume 100% of one cpu (I
 have two cpu's) even after I get the error message from manifolfcf crawler
 ui.

 I check the thread dump in solr admin and saw that the following threads
 take the most cpu/user time
 
 http-8080-3 (32)

- java.io.FileInputStream.readBytes(Native Method)
- java.io.FileInputStream.read(FileInputStream.java:236)
- java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
- java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
- java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
- java.io.FilterInputStream.read(FilterInputStream.java:133)
- org.apache.tika.io.TailStream.read(TailStream.java:117)
- org.apache.tika.io.TailStream.skip(TailStream.java:140)
- org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
- org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
-

 org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
- org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
-
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
-
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
-

 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
-

 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
-

 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
-

 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
-

 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
- org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
-

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
-

 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
-

 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
-

 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
-

 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
-

 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
-

 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
-

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
-

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
-

 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
-

 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
-

 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
-

 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
-
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
- java.lang.Thread.run(Thread.java:679)

 

 does anyone know what can I do? how to debug this issue? how can I check
 which file cause tika to work so hard?
 I don't see anything in the log files and I am stuck
 Thanks,
 Yossi

Re: Some text not indexed in solr4.4

2013-09-17 Thread Utkarsh Sengar

To add to it, I see the exact problem with the queries: nikon d7100,
nikon d5100, samsung ps-we450 etc.

Thanks,
-Utkarsh


On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 I have a copyField called allText with type text_general:
 https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68

 I have ~100 documents which have the text: dyson and dc44 or dc41 etc.

 For example:
 title: Dyson DC44 Animal Digital Slim Cordless Vacuum
 description: The DC44 Animal is the new Dyson Digital Slim vacuum
 cleaner  the cordless machine that doesn’t lose suction. It has been
 engineered for floor to ceiling cleaning. DC44 Animal has a detachable
 long-reach wand  which is balanced for floor to ceiling cleaning.   The
 motorized floor tool has twice the power of the DC35 floor tool  to drive
 the bristles deeper into the carpet pile with more force. It attaches to
 the wand or directly to the machine for cleaning awkward spaces. The brush
 bar has carbon fiber filaments for removing fine dust from hard floors.
 DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
 Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
 manganese cobalt battery and Root Cyclone technology for constant  powerful
 suction.,
 UPC: 0879957006362

 The documents are indexed.

 Analysis says its indexeD: http://i.imgur.com/O52ino1.png
 But when I search for allText:dyson dc44 I get no results, response:
 http://pastie.org/8334220

 Any suggestions about the problem? I am out of ideas about how to debug
 this.

 --
 Thanks,
 -Utkarsh




-- 
Thanks,
-Utkarsh

Re: Some text not indexed in solr4.4

2013-09-17 Thread Furkan KAMACI

On the other hand did you check here:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

what it says about MultiPhraseQuery?


18 Eylül 2013 Çarşamba tarihinde Furkan KAMACI furkankam...@gmail.com
adlı kullanıcı şöyle yazdı:
 Hi;

 Did you run commit command?

 18 Eylül 2013 Çarşamba tarihinde Utkarsh Sengar utkarsh2...@gmail.com
adlı kullanıcı şöyle yazdı:
 To add to it, I see the exact problem with the queries: nikon d7100,
 nikon d5100, samsung ps-we450 etc.

 Thanks,
 -Utkarsh


 On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:

 I have a copyField called allText with type text_general:
 https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68

 I have ~100 documents which have the text: dyson and dc44 or dc41 etc.

 For example:
 title: Dyson DC44 Animal Digital Slim Cordless Vacuum
 description: The DC44 Animal is the new Dyson Digital Slim vacuum
 cleaner  the cordless machine that doesn't lose suction. It has been
 engineered for floor to ceiling cleaning. DC44 Animal has a detachable
 long-reach wand  which is balanced for floor to ceiling cleaning.   The
 motorized floor tool has twice the power of the DC35 floor tool  to
drive
 the bristles deeper into the carpet pile with more force. It attaches to
 the wand or directly to the machine for cleaning awkward spaces. The
brush
 bar has carbon fiber filaments for removing fine dust from hard floors.
 DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
 Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
 manganese cobalt battery and Root Cyclone technology for constant
 powerful
 suction.,
 UPC: 0879957006362

 The documents are indexed.

 Analysis says its indexeD: http://i.imgur.com/O52ino1.png
 But when I search for allText:dyson dc44 I get no results, response:
 http://pastie.org/8334220

 Any suggestions about the problem? I am out of ideas about how to debug
 this.

 --
 Thanks,
 -Utkarsh




 --
 Thanks,
 -Utkarsh

Re: SPLITSHARD failure right before publishing the new sub-shards

2013-09-17 Thread HaiXin Tie


Never mind. I figured it out. It was due to a NPE on the missing
updateLog in solrconfig.xml. My solrconfig.xml is from an older Solr
release, which doesn't have certain required sections, etc. After adding
them to solrconfig.xml per this official doc, everything started to
work. It'd be great if null checks were there to produce informative
error on SolrCore.java, so as to make it easier to find the root cause.

http://wiki.apache.org/solr/SolrCloud#Required_Config

Regards,
HaiXin


On 09/16/2013 06:44 PM, HaiXin Tie wrote:

Hi Solr experts,

I am using Solr 4.4 with ZK 3.4.5, trying to split shard1 of a
collection named body. There is only one core on one machine for
this collection. When I call SPLITSHARD to split this collection, Solr
is able to create two sub-shards, but failed with a NPE in
SolrCore.java while publishing the new shards. It seems that either
the updateHandler or its updateLog is null, though they work fine in
the original shard:

SolrCore.java
if (cc != null  cc.isZooKeeperAware() 
Slice.CONSTRUCTION.equals(cd.getCloudDescriptor().getShardState())) {
  // set update log to buffer before publishing the core
862:  getUpdateHandler().getUpdateLog().bufferUpdates();

  cd.getCloudDescriptor().setShardState(null);
  cd.getCloudDescriptor().setShardRange(null);

}


Here are the details. Any pointers to aid debugging this issue is
greatly appreciated!

# curl request/response to split the shard:

curl -s
http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=bodyshard=shard1;

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status500/intint
name=QTime2688/int/lstlst
name=failurestrorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error
CREATEing SolrCore 'body_shard1_0_replica1': Unable to create core:
body_shard1_0_replica1 Caused by: null/str/lststr name=Operation
splitshard caused
exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
SPLTSHARD failed to create subshard leaders/strlst
name=exceptionstr name=msgSPLTSHARD failed to create subshard
leaders/strint name=rspCode500/int/lstlst name=errorstr
name=msgSPLTSHARD failed to create subshard leaders/strstr
name=traceorg.apache.solr.common.SolrException: SPLTSHARD failed to
create subshard leaders
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:171)
at
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:322)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at

Re: Some text not indexed in solr4.4

2013-09-17 Thread Furkan KAMACI

Hi;

Did you run commit command?

18 Eylül 2013 Çarşamba tarihinde Utkarsh Sengar utkarsh2...@gmail.com
adlı kullanıcı şöyle yazdı:
 To add to it, I see the exact problem with the queries: nikon d7100,
 nikon d5100, samsung ps-we450 etc.

 Thanks,
 -Utkarsh


 On Tue, Sep 17, 2013 at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.com
wrote:

 I have a copyField called allText with type text_general:
 https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68

 I have ~100 documents which have the text: dyson and dc44 or dc41 etc.

 For example:
 title: Dyson DC44 Animal Digital Slim Cordless Vacuum
 description: The DC44 Animal is the new Dyson Digital Slim vacuum
 cleaner  the cordless machine that doesn't lose suction. It has been
 engineered for floor to ceiling cleaning. DC44 Animal has a detachable
 long-reach wand  which is balanced for floor to ceiling cleaning.   The
 motorized floor tool has twice the power of the DC35 floor tool  to drive
 the bristles deeper into the carpet pile with more force. It attaches to
 the wand or directly to the machine for cleaning awkward spaces. The
brush
 bar has carbon fiber filaments for removing fine dust from hard floors.
 DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
 Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
 manganese cobalt battery and Root Cyclone technology for constant
 powerful
 suction.,
 UPC: 0879957006362

 The documents are indexed.

 Analysis says its indexeD: http://i.imgur.com/O52ino1.png
 But when I search for allText:dyson dc44 I get no results, response:
 http://pastie.org/8334220

 Any suggestions about the problem? I am out of ideas about how to debug
 this.

 --
 Thanks,
 -Utkarsh




 --
 Thanks,
 -Utkarsh

Re: Updated: CREATEALIAS does not work with more than one collection (Error 503: no servers hosting shard)

2013-09-17 Thread HaiXin Tie


Never mind. I figured it out. It was due to a NPE on the missing updateLog in 
solrconfig.xml. My solrconfig.xml is from an older Solr release, which doesn't 
have certain required sections, etc. After adding them to solrconfig.xml per 
this official doc, everything started to work.

http://wiki.apache.org/solr/SolrCloud#Required_Config


Regards,
HaiXin




On 09/16/2013 04:55 PM, HaiXin Tie wrote:
Sorry but I've fixed some typos, updated text:

Hello Solr experts,

For some strange reason, collection alias does not work in my Solr instance 
when more than one collection is used. I would appreciate your help.

# Here is my setup, which is quite simple:
Zookeeper: 3.4.5 (used to upconfig/linkconfig collections and configs for c1 
and c2)
Solr: version 4.4.0, with two collections c1 and c2 (solr.xml included) created 
using remote core API calls

# Symptoms:
1. Solr queries to each individual collection works fine:
   http://localhost:8983/solr/c1/select?q=*:*
   http://localhost:8983/solr/c2/select?q=*:*
2. CREATEALIAS name=cx for c1 or c2 alone (e.g. 1-1 mapping) works fine:
   http://localhost:8983/solr/cx/select?q=*:*
3. CREATEALIAS name=cx for c1 and c2 does not work:

   # Solr request/response to the collection alias (success):
   curl -s 
http://localhost:8983/solr/admin/collections?action=CREATEALIASname=cxcollections=c1,c2;http://localhost:8983/solr/admin/collections?action=CREATEALIASname=cxcollections=c1,c2
   ?xml version=1.0 encoding=UTF-8?
   response
   lst name=responseHeaderint name=status0/intint 
name=QTime134/int/lst
   /response

   # Solr query using the alias fails with Error 503: no servers hosting shard
   http://localhost:8983/solr/cx/select?q=*:*
   responselst name=responseHeaderint name=status503/intint name=QTime2/intlst name=paramsstr 
name=q*:*/str/lst/lstlst name=errorstr name=msgno servers hosting shard: /strint name=code503/int/lst/response


# Solr logs:
3503223 [qtp724646150-11] ERROR org.apache.solr.core.SolrCore  ? 
org.apache.solr.common.SolrException: no servers hosting shard:
   at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
   at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

3503224 [qtp724646150-11] INFO  org.apache.solr.core.SolrCore  ? [c1] 
webapp=/solr path=/select params={q=*:*} status=503 QTime=2

# solr.xml
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true sharedLib=lib
 cores host=${host:} adminPath=/admin/cores hostPort=${jetty.port:} 
hostContext=${hostContext:solr}
   core shard=shard1 instanceDir=c1/ name=c1 collection=c1/
   core shard=shard1 instanceDir=c2/ name=c2 collection=c2/
 /cores
/solr

# zookeeper alias (same from solr/cloud UI):
[zk: localhost:2181(CONNECTED) 10] get /myroot/aliases.json
{collection:{
   cx:c1,c2}}
cZxid = 0x110d
ctime = Fri Sep 13 17:25:18 PDT 2013
mZxid = 0x18d1
mtime = Mon Sep 16 16:31:21 PDT 2013
pZxid = 0x110d
cversion = 0
dataVersion = 19
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 119
numChildren = 0

BTW, I've spent a lot of time figuring out how to make zookeeper and solr work 
together. The commands are not complex, but making them work sometimes requires 
a lot of digging online, to figure out missing jars for zkCli.sh, etc. I know a 
lot of things are changing since Solr 4.0, but I really hope the Solr 
documentation can be better maintained, so that people won't have to spend tons 
of hours figuring out simple steps (albeit complex under the hood) like this. 
Thanks!





This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents

2013-09-17 Thread Otis Gospodnetic

Hi

50m docs across 18 servers 48gb RAM ain't much. I doubt you are hitting any
limits in lucene or solr.

How heavy is your index rate?

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Sep 17, 2013 5:25 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 Currently I hafer over 50+ millions documents at my index and as I mentiod
 before at another question I have some problems while indexing (jetty EOF
 exception) I know that problem may not be about index size but just I want
 to learn that is there any limit for document size at Solr that if I exceed
 it I can have some problems? I am not talking about the theoretical limit.

 What are the maximim index size for folks and what they to handle heavy
 index rate when having millions of documents. What tuning strategies they
 do?

 PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr
 4.2.1 for my SolrCloud.

Re: Some text not indexed in solr4.4

2013-09-17 Thread Jason Hellman

Utkarsh,

Check to see if the value is actually indexed into the field by using the Terms 
request handler:

http://localhost:8983/solr/terms?terms.fl=textterms.prefix=d

(adjust the prefix to whatever you're looking for)

This should get you going in the right direction.

Jason


On Sep 17, 2013, at 2:20 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:

 I have a copyField called allText with type text_general:
 https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68
 
 I have ~100 documents which have the text: dyson and dc44 or dc41 etc.
 
 For example:
 title: Dyson DC44 Animal Digital Slim Cordless Vacuum
 description: The DC44 Animal is the new Dyson Digital Slim vacuum
 cleaner  the cordless machine that doesn’t lose suction. It has been
 engineered for floor to ceiling cleaning. DC44 Animal has a detachable
 long-reach wand  which is balanced for floor to ceiling cleaning.   The
 motorized floor tool has twice the power of the DC35 floor tool  to drive
 the bristles deeper into the carpet pile with more force. It attaches to
 the wand or directly to the machine for cleaning awkward spaces. The brush
 bar has carbon fiber filaments for removing fine dust from hard floors.
 DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
 Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
 manganese cobalt battery and Root Cyclone technology for constant  powerful
 suction.,
 UPC: 0879957006362
 
 The documents are indexed.
 
 Analysis says its indexeD: http://i.imgur.com/O52ino1.png
 But when I search for allText:dyson dc44 I get no results, response:
 http://pastie.org/8334220
 
 Any suggestions about the problem? I am out of ideas about how to debug
 this.
 
 -- 
 Thanks,
 -Utkarsh

Querying a non-indexed field?

2013-09-17 Thread Scott Schneider

Hello,

Is it possible to restrict query results using a non-indexed, stored field?  
e.g. I might index fewer fields to reduce the index size.  I query on a few 
indexed fields, getting a small # of results.  I want to restrict this further 
based on values from non-indexed, stored fields.  I can obviously do this 
myself, but it would be nice if Solr could do this for me.

Thanks,
Scott

Re: Querying a non-indexed field?

2013-09-17 Thread Walter Underwood

No.  --wunder

On Sep 17, 2013, at 5:16 PM, Scott Schneider wrote:

 Hello,
 
 Is it possible to restrict query results using a non-indexed, stored field?  
 e.g. I might index fewer fields to reduce the index size.  I query on a few 
 indexed fields, getting a small # of results.  I want to restrict this 
 further based on values from non-indexed, stored fields.  I can obviously do 
 this myself, but it would be nice if Solr could do this for me.
 
 Thanks,
 Scott

Re: how to make sure all the index docs flushed to the index files

2013-09-17 Thread YouPeng Yang

Hi Erick and Shawn

Thanks a lot

2013/9/17 Erick Erickson erickerick...@gmail.com

Here's a blog about tlogs and commits:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

And here's Mike's excellent segment merging blog

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick

On Tue, Sep 17, 2013 at 6:36 AM, Shawn Heisey s...@elyograg.org wrote:

How to control the size of the index files.

An index segment gets created after every hard commit. In the listing
that you sent, all the files starting with _28w are a single segment.
All the files starting with _28x are another segment.

You can bump up the number of open files allowed by your operating
system. On Linux, this is controlled by the /etc/security/limits.conf
file. Here are some example config lines for that file:

elyograghardnofile 6144
elyogragsoftnofile 4096
roothardnofile 6144
rootsoftnofile 4096

https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig

Solr SpellCheckComponent only shows results with certain fields

2013-09-17 Thread jazzy

I'm trying to get the Solr SpellCheckComponent working but am running into
some issues. When I run
.../solr/collection1/select?q=%3Awt=jsonindent=true

These results are returned

{
  responseHeader: {
status: 0,
QTime: 1,
params: {
  indent: true,
  q: *:*,
  _: 1379457032534,
  wt: json
}
  },
  response: {
numFound: 2,
start: 0,
docs: [
  {
enterprise_name: because,
name: doc1,
enterprise_id: 100,
_version_: 1446463888248799200
  },
  {
enterprise_name: what,
name: RZTEST,
enterprise_id: 102,
_version_: 1446464432735518700
  }
]
  }
}
Those are the values that I have indexed. Now when I want to query for
spelling I get some weird results.

When I run
.../solr/collection1/select?q=name%3Arxtestwt=jsonindent=truespellcheck=true

The results are accurate and I get

{
  responseHeader:{
status:0,
QTime:4,
params:{
  spellcheck:true,
  indent:true,
  q:name:rxtest,
  wt:json}},
  response:{numFound:0,start:0,docs:[]
  },
  spellcheck:{
suggestions:[
  rxtest,{
numFound:1,
startOffset:5,
endOffset:11,
suggestion:[rztest]}]}}
Anytime I run a query without the name values I get 0 results back.
/solr/collection1/select?q=enterprise_name%3Abecauswt=jsonindent=truespellcheck=true

{
  responseHeader:{
status:0,
QTime:5,
params:{
  spellcheck:true,
  indent:true,
  q:enterprise_name:becaus,
  wt:json}},
  response:{numFound:0,start:0,docs:[]
  },
  spellcheck:{
suggestions:[]}} 
My guess is that there is something wrong in my scheme but everything looks
fine.

Schema.xml

field name=name type=text_general indexed=true stored=true/
field name=enterprise_id type=string indexed=true stored=true
required=true / 
field name=enterprise_name type=text_general indexed=true
stored=true/

field name=text type=text_general indexed=true stored=false
multiValued=true /

   dynamicField name=*_t  type=text_generalindexed=true 
stored=true/
   dynamicField name=*_txt type=text_general   indexed=true 
stored=true multiValued=true/
   dynamicField name=attr_* type=text_general indexed=true
stored=true multiValued=true/

 copyField source=name dest=text/
 copyField source=enterprise_name dest=text/ 


fieldType name=text_general class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /

filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
solrconfig.xml

requestHandler name=/select class=solr.SearchHandler

 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dftext/str 
   
   str name=spellcheck.dictionarydefault/str 
  
  str name=spellcheck.dictionarywordbreak/str
  
  str name=spellcheck.onlyMorePopularfalse/str
  
  str name=spellcheck.extendedResultsfalse/str
  
  str name=spellcheck.count5/str
/lst

 arr name=last-components
strspellcheck/str
  /arr
requestHandler

searchComponent name=spellcheck class=solr.SpellCheckComponent

  lst name=spellchecker

str name=namedefault/str

str name=classnamesolr.IndexBasedSpellChecker/str

str name=fieldname/str

str name=spellcheckIndexDir./spellchecker/str

str name=accuracy0.5/str

float name=thresholdTokenFrequency.0001/float
str name=buildOnCommittrue/str
  /lst
  
  lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=fieldname/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges3/int
str name=buildOnCommittrue/str
  /lst

  
  str name=queryAnalyzerFieldTypetext_general/str
/searchComponent

Any help would be appreciated.
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SpellCheckComponent-only-shows-results-with-certain-fields-tp4090727.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud liveness problems

2013-09-17 Thread Mark Miller

SOLR-5243 and SOLR-5240 will likely improve the situation. Both fixes are in 
4.5 - the first RC for 4.5 will likely come tomorrow.

Thanks to yonik for sussing these out.

- Mark

On Sep 17, 2013, at 2:43 PM, Mark Miller markrmil...@gmail.com wrote:

 
 On Sep 17, 2013, at 12:00 PM, Vladimir Veljkovic 
 vladimir.veljko...@boxalino.com wrote:
 
 Hello there,
 
 we have following setup:
 
 SolrCloud 4.4.0 (3 nodes, physical machines)
 Zookeeper 3.4.5 (3 nodes, physical machines)
 
 We have a number of rather small collections (~10K or ~100K of documents), 
 that we would like to load to all Solr instances (numShards=1, 
 replication_factor=3), and access them through local network interface, as 
 the load balancing is done in layers above.
 
 We can live (and we actually do it in the test phase) with updating the 
 entire collections whenever we need it, switching collection aliases and 
 removing the old collections.
 
 We stumbled across following problem: as soon as all three Solr nodes become 
 a leader to at least one collection, restarting any node makes it completely 
 unresponsive (timeout), both though admin interface and for replication. If 
 we restart all solr nodes the cluster end up in some kind of deadlock and 
 only remedy we found is Solr clean installation, removing ZooKeeper data and 
 re-posting collections.
 
 Apparently, leader is waiting for replicas to come up and they try to 
 synchronize but timeout on http requests, so everything ends up in some kind 
 of dead lock, maybe related to:
 
 https://issues.apache.org/jira/browse/SOLR-5240
 
 Yup, that sounds exactly what you would expect with SOLR-5240. A fix for that 
 is coming in 4.5, which is a probably a week or so away.
 
 
 Eventually (after few minutes), leader takes over, mark collections active 
 but remains blocked on http interface, so other nodes can not synchronize.
 
 In further tests, we loaded 4 collections with numShards=1 and 
 replication_factor=2. By chance, one node become the leader for all 4 
 collections. Restarting the node which was not the leader is done without 
 the problem, but when we restarted the leader it happened that:
 - leader shut down, other nodes became leaders of 2 collections each
 - leader starts up, 3 collections on it become active, one collection 
 remains ”down” and node becomes unresponsive and timeouts on http requests.
 
 Hard to say - I'll experiment with 4.5 and see if I can duplicate this.
 
 - Mark
 
 
 As this behavior is completely unexpected for one cluster solution, I wonder 
 if somebody else experienced same problems or we are doing something 
 entirely wrong.
 
 Best regards
 
 -- 
 
 Vladimir Veljkovic
 Senior Java Entwickler
 
 Boxalino AG
 
 vladimir.veljko...@boxalino.com 
 www.boxalino.com 
 
 
 Tuning Kit for your Online Shop
 
 Product Search - Recommendations - Landing Pages - Data intelligence - 
 Mobile Commerce

FAcet with values are displayes in output

2013-09-17 Thread Prasi S

Hi ,
Im using solr 4.4 for our search. When i query for a keyword, it returns
empty valued facets in the response

lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=Country
*int name=1/int*
int name=USA1/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst

I have also tried using facet.missing parameter., but no change. How can we
handle this.


Thanks,
Prasi

how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Liu Bo

Hi all

Our system has distributed MySQL databases, we create a database for every
customer signed up and distributed it to one of our MySQL hosts.

We currently use lucene core to perform search on these databases, and we
write java code to loop through these databases and convert the data to
lucene index.

Right now we are planning to move to Solr for distribution, and I am doing
investigation on it.

I tried to use DataImportHandlerhttp://wiki.apache.org/solr/DataImportHandler
in
the wiki page, but I can't figured out a way to use multiple datasoures
with the same schema.

The other question is, we have the database connection data in one table,
can I create datasource connections info from it, and loop through the
databases using DataImporter?

If DataImporter isn't working, is there a way to feed data to solr using
customized SolrRequestHandler without using SolrJ?

If neither of these two ways is working, I think I am going to reuse the
DAO of the old project and feed the data to solr using SolrJ, probably
using embedded Solr server.

Your help will be much of my appreciation.

http://wiki.apache.org/solr/DataImportHandlerFaq--
All the best

Liu Bo

Re: how can I use DataImportHandler on multiple MySQL databases with the same schema?

2013-09-17 Thread Alexandre Rafalovitch

You can create multiple entities in DIH definition and they will all run.
Means duplicating the mapping definition apart from dataSource name, but is
doable.

Alternatively, the configuration file is read on every call to DIH. You can
edit file between different invocations or autogenerate different files
from common template and pass the name as parameter.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Sep 18, 2013 at 10:39 AM, Liu Bo diabl...@gmail.com wrote:

 Hi all

 Our system has distributed MySQL databases, we create a database for every
 customer signed up and distributed it to one of our MySQL hosts.

 We currently use lucene core to perform search on these databases, and we
 write java code to loop through these databases and convert the data to
 lucene index.

 Right now we are planning to move to Solr for distribution, and I am doing
 investigation on it.

 I tried to use DataImportHandler
 http://wiki.apache.org/solr/DataImportHandler
 in
 the wiki page, but I can't figured out a way to use multiple datasoures
 with the same schema.

 The other question is, we have the database connection data in one table,
 can I create datasource connections info from it, and loop through the
 databases using DataImporter?

 If DataImporter isn't working, is there a way to feed data to solr using
 customized SolrRequestHandler without using SolrJ?

 If neither of these two ways is working, I think I am going to reuse the
 DAO of the old project and feed the data to solr using SolrJ, probably
 using embedded Solr server.

 Your help will be much of my appreciation.

 http://wiki.apache.org/solr/DataImportHandlerFaq--
 All the best

 Liu Bo

57 matches

Mail list logo