Re: Anybody uses Solr JMX?

2014-05-05 Thread Paul Libbrecht
 Thank you everybody for the links and explanations.
 
 I am still curious whether JMX exposes more details than the Admin UI?
 I am thinking of a troubleshooting context, rather than long-term
 monitoring one.

JMX is multi-purpose.
So, in principle, it can offer considerably more.
I've seen discussions about quite many JMX variables about the activity of the 
Garbage Collector (e.g. young-generation, …) which I can't remember having seen 
in the Admin UI.

The advantage of JMX is the interface and what you can do with it.
For example plotting values is not there in the admin UI and it really can help 
to make a difference into detecting, say, the cause of sudden bursts.

paul



Re: Solr relevancy tuning

2014-05-05 Thread Jorge Luis Betancourt González
One good thing about kelvin it's more a programmatic task, so you could execute 
the scripts after a few changes/deployment and get a general idea if the new 
changes has impacted into the search experience; yeah sure the changing catalog 
it's still a problem but I kind of like to be able to execute a few commands 
and presto get it done. This could become a must-run test in the test suite of 
the app. I kind of do this already but testing from the user interface, using 
the test library provided by symfony2 (framework I'm using) and the functional 
tests. It's not test-driven-search-relevancy perse but we ensure not to mess 
up with some basic queries we use to test the search feature.

- Original Message -
From: Giovanni Bricconi giovanni.bricc...@banzai.it
To: solr-user solr-user@lucene.apache.org
Cc: Ahmet Arslan iori...@yahoo.com
Sent: Friday, April 11, 2014 5:15:56 AM
Subject: Re: Solr relevancy tuning

Hello Doug

I have just watched the quepid demonstration video, and I strongly agree
with your introduction: it is very hard to involve marketing/business
people in repeated testing session, and speadsheets or other kind of files
are not the right tool to use.
Currenlty I'm quite alone in my tuning task and having a visual approach
could be benefical for me, you are giving me many good inputs!

I see that kelvin (my scripted tool) and queepid follows the same path. In
queepid someone quickly whatches the results and applies colours to result,
in kelvin you enter one on more queries (network cable, ethernet cable) and
states that the result must contains ethernet in the title, or must come
from a list of product categories.

I also do diffs of results, before and after changes, to check what is
going on; but I have to do that in a very unix-scripted way.

Have you considered of placing a counter of total red/bad results in
quepid? I use this index to have a quick overview of changes impact across
all queries. Actually I repeat tests in production from times to time, and
if I see the kelvin temperature rising (the number of errors going up) I
know I have to check what's going on because new products maybe are having
a bad impact on the index.

I also keep counters of products with low quality images/no images at all
or too short listings, sometimes are useful to undestand better what will
happen if you change some bq/fq in the application.

I see also that after changes in quepid someone have to check gray
results and assign them a colour, in kelvin case sometimes the conditions
can do a bit of magic (new product names still contains SM-G900F) but
sometimes can introduce false errors (the new product name contains only
Galaxy 5 and not the product code SM-G900F). So some checks are needed but
with quepid everybody can do the check, with kelvin you have to change some
line of a script, and not everybody is able/willing to do that.

The idea of a static index is a good suggestion, I will try to have it in
the next round of search engine improvement.

Thank you Doug!




2014-04-09 17:48 GMT+02:00 Doug Turnbull 
dturnb...@opensourceconnections.com:

 Hey Giovanni, nice to meet you.

 I'm the person that did the Test Driven Relevancy talk. We've got a product
 Quepid (http://quepid.com) that lets you gather good/bad results for
 queries and do a sort of test driven development against search relevancy.
 Sounds similar to your existing scripted approach. Have you considered
 keeping a static catalog for testing purposes? We had a project with a lot
 of updates and date-dependent relevancy. This lets you create some test
 scenarios against a static data set. However, one downside is you can't
 recreate problems in production in your test setup exactly-- you have to
 find a similar issue that reflects what you're seeing.

 Cheers,
 -Doug


 On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi 
 giovanni.bricc...@banzai.it wrote:

  Thank you for the links.
 
  The book is really useful, I will definitively have to spend some time
  reformatting the logs to to access number of result founds, session id
 and
  much more.
 
  I'm also quite happy that my test cases produces similar results to the
  precision reports shown at the beginning of the book.
 
  Giovanni
 
 
  2014-04-09 12:59 GMT+02:00 Ahmet Arslan iori...@yahoo.com:
 
   Hi Giovanni,
  
   Here are some relevant pointers :
  
  
  
 
 http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
  
  
   http://rosenfeldmedia.com/books/search-analytics/
  
   http://www.sematext.com/search-analytics/index.html
  
  
   Ahmet
  
  
   On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi 
   giovanni.bricc...@banzai.it wrote:
   It is about one year I'm working on an e-commerce site, and
  unfortunately I
   have no information retrieval background, so probably I am missing
 some
   important practices about relevance tuning and search engines.
   During this period I had 

stats pse-udo field score

2014-05-05 Thread frank shi
hey,everyone, In our application we are using Solr 4.6.
I had the idea to use stats component for score pse-udo field. 
Is it exists workaround of using …stats=truestats.field=score... ? 
thanks a lot!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/stats-pse-udo-field-score-tp4134635.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Victor Pascual
Thank you very much for you help Ahmet.

However the language detection is still not workin. :(
My solrconfig.xml didn't contain that lst section inside the update
requestHandler.
That's the content I added:

  requestHandler name=/update
   class=solr.XmlUpdateRequestHandler
lst name=defaults
  str name=update.chainlangid/str
/lst
 /requestHandler



updateRequestProcessorChain name=langid
processor
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
   lst name=defaults
 str name=langid.fltext/str
 str name=langid.langFieldlang/str
   /lst
 /processor
 processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain


Now, your suggested query
http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns

response
 lst name=responseHeader
 int name=status0/int
 int name=QTime14/int
 /lst
 /response

And there is still no lang field in my documents.
Any idea what am I doing wrong?



On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 solr/update should be used, not /solr/select

 curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid'

 By the way don't you have following definition in your solrconfig.xml?

  requestHandler name=/update class=solr.UpdateRequestHandler
lst name=defaults
  str name=update.chainlangid/str
/lst
   /requestHandler



 On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
 vic...@mobilemediacontent.com wrote:
 Hi Ahmet,

 thanks for your reply. Adding update.chain=langid to my query doesn't
 work: IP:8080/solr/select/?q=*%3A*update.chain=langid
 Regarding defining the chain in an UpdateRequestHandler... sorry for the
 lame question but shall I paste those three lines to solrconfig.xml, or
 shall I add them somewhere else?

 There is not UpdateRequestHandler in my solrconfig.

 Thanks!



 On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Did you attach your chain to a UpdateRequestHandler?
 
  You can do it by adding update.chain=langid to the URL or defining it in
  a defaults section as follows
 
  lst name=defaults
   str name=update.chainlangid/str
 /lst
 
 
 
  On Tuesday, April 29, 2014 3:18 PM, Victor Pascual 
  vic...@mobilemediacontent.com wrote:
  Dear all,
 
  I'm a new user of Solr. I've managed to index a bunch of documents (in
  fact, they are tweets) and everything works quite smoothly.
 
  Nevertheless it looks like Solr doesn't detect the language of my
 documents
  nor remove stopwords accordingly so I can extract the most frequent
 terms.
 
  I've added this piece of XML to my solrconfig.xml as well as the Tika lib
  jars.
 
  updateRequestProcessorChain name=langid
 processor
 
 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
lst name=defaults
  str name=langid.fltext/str
  str name=langid.langFieldlang/str
/lst
  /processor
  processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
  There is no error in the tomcat log file, so I have no clue of why this
  isn't working.
  Any hint on how to solve this problem will be much appreciated!
 




Re: Solr does not recognize language

2014-05-05 Thread Frankcis
i think you should check your scheme.xml and solrconfig.xml encoding format =
utf-8。



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Victor Pascual
Why this should be a problem?
Both files start with ?xml version=1.0 encoding=UTF-8 ?


On Mon, May 5, 2014 at 11:44 AM, Frankcis finalxc...@gmail.com wrote:

 i think you should check your scheme.xml and solrconfig.xml encoding
 format =
 utf-8。



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr does not recognize language

2014-05-05 Thread Frankcis
because if your encoding format doesn't both utf-8, building index will lead
to messy code, of course, you will not get the expected result.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Wildcard malfunctioning

2014-05-05 Thread Román González
Hi all!

 

Sorry in advance if this question was posted but I were unable to find it
with search engines.

 

Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field

 

   field name=cultivo_es type=text_es indexed=true stored=true /

 

With this type:

 

fieldType name=text_es class=solr.TextField
positionIncrementGap=100

  analyzer 

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

filter class=solr.SpanishLightStemFilterFactory/

!-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

  /analyzer

/fieldType

 

But I’m getting these results:

 

q = cultivo_es:uva

Getting 50 correct results

 

q = cultivo_es:uva*

Getting the same 50 correct results

 

q = cultivo_es:naranja

Getting the 50 correct results of “naranja”

 

q = cultivo_es:naranja*

Getting the 0 results !

 

It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.

 

Thank you!!

 



interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/

2014-05-05 Thread Matteo Grolla
Hi everybody
can anyone give me a suitable interpretation for cat_rank in
http://people.apache.org/~hossman/ac2012eu/ slide 15

thanks

Re: Solr does not recognize language

2014-05-05 Thread Ahmet Arslan
Hi Victor,

How do you index your documents? Your last config looks correct. However for 
example if you use data import handler you need to add update.chain there too. 
Same as extraction request hadler if you are using sole-cell.

requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler
    lst name=defaults
      str name=config/home/username/data-config.xml/str
      str name=update.chainlangid/str
    /lst
  /requestHandler

By the way The URL 
http://localhost:8080/solr/update?commit=trueupdate.chain=langid was just an 
example and meant to feed xml update messages by POST method. Not to use in a 
browser.

Ahmet

On Monday, May 5, 2014 11:04 AM, Victor Pascual vic...@mobilemediacontent.com 
wrote:

Thank you very much for you help Ahmet.

However the language detection is still not workin. :(
My solrconfig.xml didn't contain that lst section inside the update 
requestHandler.
That's the content I added:

  requestHandler name=/update
                  class=solr.XmlUpdateRequestHandler
       lst name=defaults
         str name=update.chainlangid/str
       /lst
    /requestHandler


   updateRequestProcessorChain name=langid
       processor 
class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
          lst name=defaults
            str name=langid.fltext/str
            str name=langid.langFieldlang/str
          /lst
        /processor
        processor class=solr.LogUpdateProcessorFactory /
       processor class=solr.RunUpdateProcessorFactory /
     /updateRequestProcessorChain

Now, your suggested query 
http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns

response
lst name=responseHeader
int name=status0/int
int name=QTime14/int
/lst
/response
And there is still no lang field in my documents.
Any idea what am I doing wrong?




On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote:

Hi,

solr/update should be used, not /solr/select

curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid' 

By the way don't you have following definition in your solrconfig.xml?

 requestHandler name=/update class=solr.UpdateRequestHandler  

       lst name=defaults
         str name=update.chainlangid/str
       /lst      
  /requestHandler




On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
vic...@mobilemediacontent.com wrote:
Hi Ahmet,

thanks for your reply. Adding update.chain=langid to my query doesn't
work: IP:8080/solr/select/?q=*%3A*update.chain=langid
Regarding defining the chain in an UpdateRequestHandler... sorry for the
lame question but shall I paste those three lines to solrconfig.xml, or
shall I add them somewhere else?

There is not UpdateRequestHandler in my solrconfig.

Thanks!



On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Did you attach your chain to a UpdateRequestHandler?

 You can do it by adding update.chain=langid to the URL or defining it in
 a defaults section as follows

 lst name=defaults
      str name=update.chainlangid/str
    /lst



 On Tuesday, April 29, 2014 3:18 PM, Victor Pascual 
 vic...@mobilemediacontent.com wrote:
 Dear all,

 I'm a new user of Solr. I've managed to index a bunch of documents (in
 fact, they are tweets) and everything works quite smoothly.

 Nevertheless it looks like Solr doesn't detect the language of my documents
 nor remove stopwords accordingly so I can extract the most frequent terms.

 I've added this piece of XML to my solrconfig.xml as well as the Tika lib
 jars.

     updateRequestProcessorChain name=langid
        processor

 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
           lst name=defaults
             str name=langid.fltext/str
             str name=langid.langFieldlang/str
           /lst
         /processor
         processor class=solr.LogUpdateProcessorFactory /
        processor class=solr.RunUpdateProcessorFactory /
      /updateRequestProcessorChain

 There is no error in the tomcat log file, so I have no clue of why this
 isn't working.
 Any hint on how to solve this problem will be much appreciated!





Re: Wildcard malfunctioning

2014-05-05 Thread Ahmet Arslan


Hi Roman,

What you are experiencing is a OK and known. Stemming and wildcard searches 
could be counter intuitive sometimes. But luckily remedy is available. Use the 
following filters, and your wildcard searches will be happy. Please not that 
this change will require solr-restart and re-index.

 filter class=solr.KeywordRepeatFilterFactory/
 filter class=solr.SpanishLightStemFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/

Regarding diacritics, please see 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory
 
and http://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet


On Monday, May 5, 2014 2:01 PM, Román González rgonza...@normagricola.com 
wrote:
Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



   field name=cultivo_es type=text_es indexed=true stored=true /



With this type:



    fieldType name=text_es class=solr.TextField
positionIncrementGap=100

      analyzer 

        tokenizer class=solr.StandardTokenizerFactory/

        filter class=solr.LowerCaseFilterFactory/

        filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

        filter class=solr.SpanishLightStemFilterFactory/

        !-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

      /analyzer

    /fieldType



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!


Re: Wildcard malfunctioning

2014-05-05 Thread Jack Krupansky
Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case 
conversion filters.


But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.


-- Jack Krupansky

-Original Message- 
From: Román González

Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it
with search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards
or I’m misunderstanding something. I have the field



  field name=cultivo_es type=text_es indexed=true stored=true /



With this type:



   fieldType name=text_es class=solr.TextField
positionIncrementGap=100

 analyzer

   tokenizer class=solr.StandardTokenizerFactory/

   filter class=solr.LowerCaseFilterFactory/

   filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

   filter class=solr.SpanishLightStemFilterFactory/

   !-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

 /analyzer

   /fieldType



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need
it in order to filter diacritics according to Spanish rules.



Thank you!!





RE: Wildcard malfunctioning

2014-05-05 Thread Román González
SOLVED!

First solution I tried (the Ahmet's one) worked fine!

Thank you!

-Mensaje original-
De: Jack Krupansky [mailto:j...@basetechnology.com] 
Enviado el: lunes, 05 de mayo de 2014 13:19
Para: solr-user@lucene.apache.org; rgonza...@normagricola.com
Asunto: Re: Wildcard malfunctioning

Generally, stemming filters are not supported when wildcards are present. 
Only a small subset of filters work with wildcards, such as the case conversion 
filters.

But, you stay that you are using the stemmer to remove diacritical marks... 
you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.

-- Jack Krupansky

-Original Message-
From: Román González
Sent: Monday, May 5, 2014 7:00 AM
To: solr-user@lucene.apache.org
Subject: Wildcard malfunctioning

Hi all!



Sorry in advance if this question was posted but I were unable to find it with 
search engines.



Filter SpanishLightStemFilterFactory is not working properly with wildcards or 
I’m misunderstanding something. I have the field



   field name=cultivo_es type=text_es indexed=true stored=true /



With this type:



fieldType name=text_es class=solr.TextField
positionIncrementGap=100

  analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_es.txt format=snowball /

filter class=solr.SpanishLightStemFilterFactory/

!-- more aggressive: filter
class=solr.SnowballPorterFilterFactory language=Spanish/ --

  /analyzer

/fieldType



But I’m getting these results:



q = cultivo_es:uva

Getting 50 correct results



q = cultivo_es:uva*

Getting the same 50 correct results



q = cultivo_es:naranja

Getting the 50 correct results of “naranja”



q = cultivo_es:naranja*

Getting the 0 results !



It works fine if I remove SpanishLightStemFilterFactory filter, but I need it 
in order to filter diacritics according to Spanish rules.



Thank you!!





Solr Not Searching while INDEXING the DATA

2014-05-05 Thread Sohan Kalsariya
I am not able to search for the data while indexing.
Indexing is done via the dataimport handler.
While searching for the documents (in between indexing is happening), it
gives the broken pipe exception and wont search anything.
What should be the proper solution for this problem?
Am I missing something?
Help me!

-- 
Regards,
*Sohan Kalsariya*


Explain Solr Query Execution

2014-05-05 Thread nativecoder
How will a query like below will get executed, In which order

I understand that when this query is executed fields mentioned in fieldList
will be returned. What I don't understand is how the samplestring1 and
samplestring2 will get searched with the query fields specified

I think I will be able to understand how the search happens if this can be
illustrated in SQL ( Just to understand what happens behind the scene)

Following is the query. Please have a look at it and let me know how this
works internally.
query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2
resultRows: 10
startRow: 0

P.S samplestring1 AND samplestring2  are some test strings in the query

Sample of Schema for fields

fieldType name=sampletype1 class=solr.TextField
positionIncrementGap=100analyzer type=indextokenizer
class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory/filter class=solr.NGramFilterFactory
minGramSize=5 maxGramSize=10//analyzeranalyzer
type=querytokenizer class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory//analyzer/fieldType

fieldtype name=sampletype2 class=solr.TextField sortMissingLast=true
omitNorms=trueanalyzertokenizer
class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory//analyzer/fieldtype

field name=Field1 compressed=true type=sampletype1
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Field2 compressed=true type=sampletype1
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Exact_Field1 omitPositions=true termVectors=false
omitTermFreqAndPositions=true compressed=true type=sampletype2
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Exact_Field2 omitPositions=true termVectors=false
omitTermFreqAndPositions=true compressed=true type=sampletype2
multiValued=false indexed=true stored=true required=false
omitNorms=true/

copyField source=Field1 dest=Exact_Field1/
copyField source=Field2 dest=Exact_Field2/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Explain-Solr-Query-Execution-tp4134681.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not recognize language

2014-05-05 Thread Victor Pascual
Hi there,

I'm indexing my documents using mysolr. I mainly generate a lost of json
objects and the run: solr.update(documents_array,'json')


On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Victor,

 How do you index your documents? Your last config looks correct. However
 for example if you use data import handler you need to add update.chain
 there too. Same as extraction request hadler if you are using sole-cell.

 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
   str name=config/home/username/data-config.xml/str
   str name=update.chainlangid/str
 /lst
   /requestHandler

 By the way The URL
 http://localhost:8080/solr/update?commit=trueupdate.chain=langid was
 just an example and meant to feed xml update messages by POST method. Not
 to use in a browser.

 Ahmet

 On Monday, May 5, 2014 11:04 AM, Victor Pascual 
 vic...@mobilemediacontent.com wrote:

 Thank you very much for you help Ahmet.

 However the language detection is still not workin. :(
 My solrconfig.xml didn't contain that lst section inside the update
 requestHandler.
 That's the content I added:

   requestHandler name=/update
   class=solr.XmlUpdateRequestHandler
lst name=defaults
  str name=update.chainlangid/str
/lst
 /requestHandler
 

updateRequestProcessorChain name=langid
processor
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
   lst name=defaults
 str name=langid.fltext/str
 str name=langid.langFieldlang/str
   /lst
 /processor
 processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

 Now, your suggested query
 http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime14/int
 /lst
 /response
 And there is still no lang field in my documents.
 Any idea what am I doing wrong?




 On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 solr/update should be used, not /solr/select
 
 curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid'
 
 By the way don't you have following definition in your solrconfig.xml?
 
  requestHandler name=/update class=solr.UpdateRequestHandler
 
lst name=defaults
  str name=update.chainlangid/str
/lst
   /requestHandler
 
 
 
 
 On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
 vic...@mobilemediacontent.com wrote:
 Hi Ahmet,
 
 thanks for your reply. Adding update.chain=langid to my query doesn't
 work: IP:8080/solr/select/?q=*%3A*update.chain=langid
 Regarding defining the chain in an UpdateRequestHandler... sorry for the
 lame question but shall I paste those three lines to solrconfig.xml, or
 shall I add them somewhere else?
 
 There is not UpdateRequestHandler in my solrconfig.
 
 Thanks!
 
 
 
 On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Did you attach your chain to a UpdateRequestHandler?
 
  You can do it by adding update.chain=langid to the URL or defining it
 in
  a defaults section as follows
 
  lst name=defaults
   str name=update.chainlangid/str
 /lst
 
 
 
  On Tuesday, April 29, 2014 3:18 PM, Victor Pascual 
  vic...@mobilemediacontent.com wrote:
  Dear all,
 
  I'm a new user of Solr. I've managed to index a bunch of documents (in
  fact, they are tweets) and everything works quite smoothly.
 
  Nevertheless it looks like Solr doesn't detect the language of my
 documents
  nor remove stopwords accordingly so I can extract the most frequent
 terms.
 
  I've added this piece of XML to my solrconfig.xml as well as the Tika
 lib
  jars.
 
  updateRequestProcessorChain name=langid
 processor
 
 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
lst name=defaults
  str name=langid.fltext/str
  str name=langid.langFieldlang/str
/lst
  /processor
  processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
  There is no error in the tomcat log file, so I have no clue of why this
  isn't working.
  Any hint on how to solve this problem will be much appreciated!
 
 
 



Help to Understand a Solr Query

2014-05-05 Thread nativecoder
Hi All

I am completely new to solr and hoping to understand the basics. Can one of
you help me to understand what the following query does, in which order it
is getting executed

I understand that when this query is executed fields mentioned in fieldList
will be returned. What I don't understand is how the samplestring1 and
samplestring2 will get searched with the query fields specified

I think I will be able to understand how the search happens if this can be
illustrated in SQL ( Just to understand what happens behind the scene)

Following is the query. Please have a look at it and let me know how this
works internally.
query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2
resultRows: 10
startRow: 0

P.S samplestring1 AND samplestring2  are some test strings in the query

Sample of Schema for fields

fieldType name=sampletype1 class=solr.TextField
positionIncrementGap=100analyzer type=indextokenizer
class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory/filter class=solr.NGramFilterFactory
minGramSize=5 maxGramSize=10//analyzeranalyzer
type=querytokenizer class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory//analyzer/fieldType

fieldtype name=sampletype2 class=solr.TextField sortMissingLast=true
omitNorms=trueanalyzertokenizer
class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory//analyzer/fieldtype

field name=Field1 compressed=true type=sampletype1
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Field2 compressed=true type=sampletype1
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Exact_Field1 omitPositions=true termVectors=false
omitTermFreqAndPositions=true compressed=true type=sampletype2
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Exact_Field2 omitPositions=true termVectors=false
omitTermFreqAndPositions=true compressed=true type=sampletype2
multiValued=false indexed=true stored=true required=false
omitNorms=true/

copyField source=Field1 dest=Exact_Field1/
copyField source=Field2 dest=Exact_Field2/ 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help to Understand a Solr Query

2014-05-05 Thread Jack Krupansky

Read up on the edismax query parser first:
http://wiki.apache.org/solr/ExtendedDisMax

The ^ operator is known as boosting or field boosting and is used to 
influence document scores for relevancy.


It has no analog in SQL.

-- Jack Krupansky

-Original Message- 
From: nativecoder

Sent: Monday, May 5, 2014 9:11 AM
To: solr-user@lucene.apache.org
Subject: Help to Understand a Solr Query

Hi All

I am completely new to solr and hoping to understand the basics. Can one of
you help me to understand what the following query does, in which order it
is getting executed

I understand that when this query is executed fields mentioned in fieldList
will be returned. What I don't understand is how the samplestring1 and
samplestring2 will get searched with the query fields specified

I think I will be able to understand how the search happens if this can be
illustrated in SQL ( Just to understand what happens behind the scene)

Following is the query. Please have a look at it and let me know how this
works internally.
query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2
resultRows: 10
startRow: 0

P.S samplestring1 AND samplestring2  are some test strings in the query

Sample of Schema for fields

fieldType name=sampletype1 class=solr.TextField
positionIncrementGap=100analyzer type=indextokenizer
class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory/filter class=solr.NGramFilterFactory
minGramSize=5 maxGramSize=10//analyzeranalyzer
type=querytokenizer class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory//analyzer/fieldType

fieldtype name=sampletype2 class=solr.TextField sortMissingLast=true
omitNorms=trueanalyzertokenizer
class=solr.KeywordTokenizerFactory/filter
class=solr.LowerCaseFilterFactory//analyzer/fieldtype

field name=Field1 compressed=true type=sampletype1
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Field2 compressed=true type=sampletype1
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Exact_Field1 omitPositions=true termVectors=false
omitTermFreqAndPositions=true compressed=true type=sampletype2
multiValued=false indexed=true stored=true required=true
omitNorms=true/

field name=Exact_Field2 omitPositions=true termVectors=false
omitTermFreqAndPositions=true compressed=true type=sampletype2
multiValued=false indexed=true stored=true required=false
omitNorms=true/

copyField source=Field1 dest=Exact_Field1/
copyField source=Field2 dest=Exact_Field2/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Block Join Score Highlighting

2014-05-05 Thread StrW_dev
I changed the hardcoded BlockJoinChildQParser setting to use the parent
scoring and that seems to work. So I think I got rid of the scoring issue
:).
I also voted for the issue!


Didn't find a solution for the highlighting issue at the moment, but I am
considering to omit highlighting for now as it also causes the index to grow
big quickly as the fields need to be stored to support highlighting. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Block-Join-Score-Highlighting-tp4134045p4134702.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can't make GET request to solr in android app

2014-05-05 Thread blach
thanks,

basically I'm running solr on my localhost(computer) and trying to access it
through the emulator in eclipse, NOT in the physical phone.

Can it be done?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Hakim Benoudjit
The index is made with the same version of solr, that is searching (4.6.0),
the config file (solrconfig.xml)  schema.xml is the same too.
The only way for me to solve this issue is to let only one process to index
at the same time. Wouldnt a layer of message queue resolve this issue?


2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org:

 On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
  Ok. These files contain what you've requested:
 
  First (the xml error): http://pastebin.com/ZcagK3T7
  Second (java params): http://pastebin.com/JtWQpp6s
  Third (Solr version): http://pastebin.com/wYdpdsAW

 Are you running with an index originally built by an earlier version of
 Solr?  If you are, you may be running into a known bug.  The last
 caused by section of the java stacktrace looks similar to the one in
 this issue -- which is indeed index corruption:

 https://issues.apache.org/jira/browse/LUCENE-5377

 If that's the problem you're experiencing, upgrading your Solr version
 will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
 contrib jars should cause zero problems for your 4.6.0 install.
 Upgrading to 4.7.2 or 4.8.0 should be done with more care.

 Thanks,
 Shawn




-- 
Hakim Benoudjit.


Re: can't make GET request to solr in android app

2014-05-05 Thread blach
Hi, 
It's not an error if you see my code, there is a catch statement, which
contains the FAIL message, it does always show it.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134709.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wildcard malfunctioning

2014-05-05 Thread Shawn Heisey
On 5/5/2014 5:19 AM, Jack Krupansky wrote:
 But, you stay that you are using the stemmer to remove diacritical
 marks... you can/should use ASCIIFoldingFilterFactory or
 MappingCharFilterFactory.

I like ICUFoldingFilterFactory for this, but it does require additional
contrib jars (included in the Solr download).  It lowercases too.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory

Thanks,
Shawn



Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Hakim Benoudjit
Is there an option in Solr (solrconfig.xml or somewhere else) to regularize
commits to the index.
I meant to do a 'sleep' between each commit to the index, when data
to-be-indexed is waiting inside a stack.


2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com:

 The index is made with the same version of solr, that is searching
 (4.6.0), the config file (solrconfig.xml)  schema.xml is the same too.
 The only way for me to solve this issue is to let only one process to
 index at the same time. Wouldnt a layer of message queue resolve this issue?


 2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org:

 On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
  Ok. These files contain what you've requested:
 
  First (the xml error): http://pastebin.com/ZcagK3T7
  Second (java params): http://pastebin.com/JtWQpp6s
  Third (Solr version): http://pastebin.com/wYdpdsAW

 Are you running with an index originally built by an earlier version of
 Solr?  If you are, you may be running into a known bug.  The last
 caused by section of the java stacktrace looks similar to the one in
 this issue -- which is indeed index corruption:

 https://issues.apache.org/jira/browse/LUCENE-5377

 If that's the problem you're experiencing, upgrading your Solr version
 will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
 contrib jars should cause zero problems for your 4.6.0 install.
 Upgrading to 4.7.2 or 4.8.0 should be done with more care.

 Thanks,
 Shawn




 --
 Hakim Benoudjit.




-- 
Hakim Benoudjit.


Re: Solr does not recognize language

2014-05-05 Thread Ahmet Arslan
Hi Victor,

I don't know mysolr, I assume you are using /update/json, lets add your chain 
to defaults section.

  requestHandler name=/update/json class=solr.UpdateRequestHandler

        lst name=defaults
         str name=stream.contentTypeapplication/json/str
         str name=update.chainlangid/str
       /lst
  /requestHandler




On Monday, May 5, 2014 4:06 PM, Victor Pascual vic...@mobilemediacontent.com 
wrote:
Hi there,

I'm indexing my documents using mysolr. I mainly generate a lost of json
objects and the run: solr.update(documents_array,'json')



On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Victor,

 How do you index your documents? Your last config looks correct. However
 for example if you use data import handler you need to add update.chain
 there too. Same as extraction request hadler if you are using sole-cell.

 requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
     lst name=defaults
       str name=config/home/username/data-config.xml/str
       str name=update.chainlangid/str
     /lst
   /requestHandler

 By the way The URL
 http://localhost:8080/solr/update?commit=trueupdate.chain=langid was
 just an example and meant to feed xml update messages by POST method. Not
 to use in a browser.

 Ahmet

 On Monday, May 5, 2014 11:04 AM, Victor Pascual 
 vic...@mobilemediacontent.com wrote:

 Thank you very much for you help Ahmet.

 However the language detection is still not workin. :(
 My solrconfig.xml didn't contain that lst section inside the update
 requestHandler.
 That's the content I added:

   requestHandler name=/update
                   class=solr.XmlUpdateRequestHandler
        lst name=defaults
          str name=update.chainlangid/str
        /lst
     /requestHandler
 

    updateRequestProcessorChain name=langid
        processor
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
           lst name=defaults
             str name=langid.fltext/str
             str name=langid.langFieldlang/str
           /lst
         /processor
         processor class=solr.LogUpdateProcessorFactory /
        processor class=solr.RunUpdateProcessorFactory /
      /updateRequestProcessorChain

 Now, your suggested query
 http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime14/int
 /lst
 /response
 And there is still no lang field in my documents.
 Any idea what am I doing wrong?




 On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,
 
 solr/update should be used, not /solr/select
 
 curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid'
 
 By the way don't you have following definition in your solrconfig.xml?
 
  requestHandler name=/update class=solr.UpdateRequestHandler
 
        lst name=defaults
          str name=update.chainlangid/str
        /lst
   /requestHandler
 
 
 
 
 On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
 vic...@mobilemediacontent.com wrote:
 Hi Ahmet,
 
 thanks for your reply. Adding update.chain=langid to my query doesn't
 work: IP:8080/solr/select/?q=*%3A*update.chain=langid
 Regarding defining the chain in an UpdateRequestHandler... sorry for the
 lame question but shall I paste those three lines to solrconfig.xml, or
 shall I add them somewhere else?
 
 There is not UpdateRequestHandler in my solrconfig.
 
 Thanks!
 
 
 
 On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
  Hi,
 
  Did you attach your chain to a UpdateRequestHandler?
 
  You can do it by adding update.chain=langid to the URL or defining it
 in
  a defaults section as follows
 
  lst name=defaults
       str name=update.chainlangid/str
     /lst
 
 
 
  On Tuesday, April 29, 2014 3:18 PM, Victor Pascual 
  vic...@mobilemediacontent.com wrote:
  Dear all,
 
  I'm a new user of Solr. I've managed to index a bunch of documents (in
  fact, they are tweets) and everything works quite smoothly.
 
  Nevertheless it looks like Solr doesn't detect the language of my
 documents
  nor remove stopwords accordingly so I can extract the most frequent
 terms.
 
  I've added this piece of XML to my solrconfig.xml as well as the Tika
 lib
  jars.
 
      updateRequestProcessorChain name=langid
         processor
 
 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory
            lst name=defaults
              str name=langid.fltext/str
              str name=langid.langFieldlang/str
            /lst
          /processor
          processor class=solr.LogUpdateProcessorFactory /
         processor class=solr.RunUpdateProcessorFactory /
       /updateRequestProcessorChain
 
  There is no error in the tomcat log file, so I have no clue of why this
  isn't working.
  Any hint on how to solve this problem will be much appreciated!
 
 
 




Re: Help to Understand a Solr Query

2014-05-05 Thread nativecoder
I already went through the link. I understand about the boosting factor for
the relevancy

query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2 

I need to understand whether the samplestring1 and samplestring 2 both will
be searched in each field mentioned in queryFields. What I meant was ;

e.g (Exact_Field1:samplestring1 AND Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 AND Exact_Field2:samplestring2) AND
(Field1:samplestring1 AND Field1:samplestring2) AND (Field2:samplestring1
AND Field2:samplestring2)

Is the above correct ?
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134714.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Erick Erickson
I don't think so. Solr excels at getting the score of single
documents, not aggregation.

It's not at all clear to me, though, that the sum of documents' scores
is a reasonable thing to sort by. Consider grouping on a very common
term. You'd never do this, but group on the elements of a text field.
Then the group 'a' would sort to the top almost always (or maybe 'the'
or...).

This sounds like an XY problem, what use-case are you trying to solve?

Best,
Erick

On Sun, May 4, 2014 at 9:31 PM, frank shi finalxc...@gmail.com wrote:
 Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing) sorts
 groups by the score of the top document within each group. E.g.
 [...]
 groups:[{
 groupValue:81cb63020d0339adb019a924b2a9e0c2,
 doclist:{numFound:9,start:0,maxScore:4.729042,docs:[
 {
   id:7481df771afe39fab368ce19dfeeb528,
   [...],
   score:4.729042},
 {
   id:c879e95b5f16343dad8b1248133727c2,
   [...],
   score:4.6635237},
 {
   id:485b9aec90fd3ef381f013c51ab6a4df,
   [...],
   score:4.347174}]
 }},
 [...]
 Is there an out-of-the-box way to sort groups by the sum of the scores of
 the documents within each group? E.g.
 [...]
 groups:[{
 groupValue:81cb63020d0339adb019a924b2a9e0c2,
 doclist:{numFound:9,start:0,scoreSum:13.739738,docs:[
 {
   id:7481df771afe39fab368ce19dfeeb528,
   [...],
   score:4.729042},
 {
   id:c879e95b5f16343dad8b1248133727c2,
   [...],
   score:4.6635237},
 {
   id:485b9aec90fd3ef381f013c51ab6a4df,
   [...],
   score:4.347174}]
 }},
 [...]
 With the release of sorting by Function Query
 (https://issues.apache.org/jira/browse/SOLR-1297), it seems that there
 should be a way to use the sum() function
 (http://wiki.apache.org/solr/FunctionQuery). But it's not quite close enough
 since the score field is not part of the documents.

 I feel like I'm close but I'm missing some obvious piece. I'm using Solr
 4.6.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134607.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Not Searching while INDEXING the DATA

2014-05-05 Thread Shawn Heisey
On 5/5/2014 5:39 AM, Sohan Kalsariya wrote:
 I am not able to search for the data while indexing.
 Indexing is done via the dataimport handler.
 While searching for the documents (in between indexing is happening), it
 gives the broken pipe exception and wont search anything.
 What should be the proper solution for this problem?

A broken pipe exception means that your client gave up and timed out
before Solr could respond, so it closed the TCP connection.  When Solr
finally was able to respond, the connection was gone, so the servlet
container logged that exception.

The most common reason for underlying performance issues that causes
problems like this is that you don't have enough RAM.  It could be
something else, of course.  A number of possible options are covered on
this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

I see that you asked the same question on the IRC channel early this
morning (in my timezone), but you were gone before I was awake to see that.

Thanks,
Shawn



Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Erick Erickson
You should not be committing from the client by and large, use the
autoCommit and autoSoftCommit options in solrconfig.xml.

See: 
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit h.benoud...@gmail.com wrote:
 Is there an option in Solr (solrconfig.xml or somewhere else) to regularize
 commits to the index.
 I meant to do a 'sleep' between each commit to the index, when data
 to-be-indexed is waiting inside a stack.


 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com:

 The index is made with the same version of solr, that is searching
 (4.6.0), the config file (solrconfig.xml)  schema.xml is the same too.
 The only way for me to solve this issue is to let only one process to
 index at the same time. Wouldnt a layer of message queue resolve this issue?


 2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org:

 On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
  Ok. These files contain what you've requested:
 
  First (the xml error): http://pastebin.com/ZcagK3T7
  Second (java params): http://pastebin.com/JtWQpp6s
  Third (Solr version): http://pastebin.com/wYdpdsAW

 Are you running with an index originally built by an earlier version of
 Solr?  If you are, you may be running into a known bug.  The last
 caused by section of the java stacktrace looks similar to the one in
 this issue -- which is indeed index corruption:

 https://issues.apache.org/jira/browse/LUCENE-5377

 If that's the problem you're experiencing, upgrading your Solr version
 will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
 contrib jars should cause zero problems for your 4.6.0 install.
 Upgrading to 4.7.2 or 4.8.0 should be done with more care.

 Thanks,
 Shawn




 --
 Hakim Benoudjit.




 --
 Hakim Benoudjit.


Re: can't make GET request to solr in android app

2014-05-05 Thread Shawn Heisey
On 5/5/2014 9:02 AM, blach wrote:
 It's not an error if you see my code, there is a catch statement, which
 contains the FAIL message, it does always show it.

In your code, you are not printing the stack trace or throwing the
exception.  If you want to see it in your own code, you'll need to
include code to write out the stacktrace from the exception.  If you
don't want to do that, you can look on the server log to see what the
exception is.

Since you are basically writing Java code (I'm aware that Dalvik is not
*actually* Java, but I've never written code for android), can you use
SolrJ instead of HttpClient?

Thanks,
Shawn



Re: Help to Understand a Solr Query

2014-05-05 Thread Jack Krupansky
dismax means Disjunction Maximum, which means Lucene takes the highest 
scoring clause (field), for each search term. This is effectively an OR of 
the clauses.



-- Jack Krupansky
-Original Message- 
From: nativecoder

Sent: Monday, May 5, 2014 11:21 AM
To: solr-user@lucene.apache.org
Subject: Re: Help to Understand a Solr Query

I already went through the link. I understand about the boosting factor for
the relevancy

query=samplestring1 AND samplestring2
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2

I need to understand whether the samplestring1 and samplestring 2 both will
be searched in each field mentioned in queryFields. What I meant was ;

e.g (Exact_Field1:samplestring1 AND Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 AND Exact_Field2:samplestring2) AND
(Field1:samplestring1 AND Field1:samplestring2) AND (Field2:samplestring1
AND Field2:samplestring2)

Is the above correct ?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134714.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: can't make GET request to solr in android app

2014-05-05 Thread blach
Yes Im reading about SOLRJ now

I wrote this code for it, but its the same problem, in this case all the app
is stopping, this is the code
 String urlString =
http://localhost:8983/solr;;
SolrServer solr = new HttpSolrServer(urlString);


SolrQuery query = new SolrQuery();
query.set(q, mem);
  
   QueryResponse response = null;

try {
response = 
solr.query(query);
} catch (SolrServerException e) 
{
// TODO Auto-generated 
catch block
e.printStackTrace();
}

SolrDocumentList results = 
response.getResults();
for (int i = 0; i  results.size(); ++i) {
  
etxt2.setText((CharSequence) 
results.get(i));
}




--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can't make GET request to solr in android app

2014-05-05 Thread Shawn Heisey
On 5/5/2014 11:05 AM, blach wrote:
 I wrote this code for it, but its the same problem, in this case all
 the app is stopping, this is the code String urlString =
 http://localhost:8983/solr;; SolrServer solr = new
 HttpSolrServer(urlString);

 SolrQuery query = new SolrQuery(); query.set(q, mem); 
 QueryResponse response = null;  try { response = solr.query(query); }
 catch (SolrServerException e) { // TODO Auto-generated catch block
 e.printStackTrace(); }  SolrDocumentList results =
 response.getResults(); for (int i = 0; i  results.size(); ++i) { 
 etxt2.setText((CharSequence) results.get(i)); }

Do you get any output to stderr?  Have you looked in the solr logfile to
see if there's an error logged there?

Note that you should add the core name to the URL -- using a path of
just /solr is deprecated in the newest Solr versions.

http://localhost:8983/solr/corename

Thanks,
Shawn



Odd XSLT behavior

2014-05-05 Thread Christopher Gross
Solr 4.7.2 (and 4.6.1)
Tomcat 7.0.52
Java 1.7.0_45 (and _55)

I'm getting some really odd behavior with some XSLT documents.  I've been
doing some upgrades to Java  Solr and I'm trying to narrow down where the
problems are happening.

I have a few XSLT docs that I put into the conf/xslt directory for my
indexes  I haven't changed the in a while, and they were working fine for a
3.X Solr, and seemed to work fine on an earlier 4.X release.

The problem is that sometimes I get an error saying that a field can't be
found.   Here's a slice of the XSLT:
  xsl:template match=doc
xsl:variable name=id select=str[@name='id']/
xsl:variable name=url select=str[@name='url']/
xsl:variable name=title select=str[@name='title']/
xsl:variable name=description select=str[@name='description']/

entry xmlns=http://www.w3.org/2005/Atom;
  titlexsl:value-of select=str[@name='title']//title
  link
xsl:attribute name=hrefxsl:value-of select=str[@name='url']
//xsl:attribute
  /link
  summary
xsl:choose
  xsl:when test=string-length($description) gt; 255
xsl:value-of select=concat(substring($description, 1, 255),
'...')/
  /xsl:when
  xsl:otherwise
xsl:value-of select=$description/
  /xsl:otherwise
/xsl:choose
   /summary
   .
/xsl:template

   I get messages saying that it can't find the description variable.
This was working perfectly well, but I can't seem to narrow down a specific
change that caused this.

Caused by: javax.xml.transform.TransformerConfigurationException:
solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description' is
undefined.
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964)
at
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)

Has anyone run into a problem like this?  Thanks!

-- Chris


Re: What are the best practices on Multiple Language support in Solr Cloud ?

2014-05-05 Thread shamik
Thanks Nicole. Leveraging dynamic field definitions is a great idea. Probably
work for me as I've a bunch of fields which are indexed as String. Just
curious about the sharding, are you using Solr Cloud. I thought of taking
the dedicated shard / core route , but then, as using a composite key (for
dedup), managing dedicated core can cause issues at times.

As far as single field representation, thanks for validating my concern.
Probably its best to use when you've to address a multi-lingual search.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-are-the-best-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Hakim Benoudjit
I've tried it  it worked by letting solr do the commit instead of my solr
client.
In solrconfig.xml:
autocommit max_time has been set to 5 minutes  autosoftcommit max_time to
something bigger.

Thanks a lot guys!


2014-05-05 16:30 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 You should not be committing from the client by and large, use the
 autoCommit and autoSoftCommit options in solrconfig.xml.

 See:
 http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 Best,
 Erick

 On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit h.benoud...@gmail.com
 wrote:
  Is there an option in Solr (solrconfig.xml or somewhere else) to
 regularize
  commits to the index.
  I meant to do a 'sleep' between each commit to the index, when data
  to-be-indexed is waiting inside a stack.
 
 
  2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com:
 
  The index is made with the same version of solr, that is searching
  (4.6.0), the config file (solrconfig.xml)  schema.xml is the same too.
  The only way for me to solve this issue is to let only one process to
  index at the same time. Wouldnt a layer of message queue resolve this
 issue?
 
 
  2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org:
 
  On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
   Ok. These files contain what you've requested:
  
   First (the xml error): http://pastebin.com/ZcagK3T7
   Second (java params): http://pastebin.com/JtWQpp6s
   Third (Solr version): http://pastebin.com/wYdpdsAW
 
  Are you running with an index originally built by an earlier version of
  Solr?  If you are, you may be running into a known bug.  The last
  caused by section of the java stacktrace looks similar to the one in
  this issue -- which is indeed index corruption:
 
  https://issues.apache.org/jira/browse/LUCENE-5377
 
  If that's the problem you're experiencing, upgrading your Solr version
  will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
  contrib jars should cause zero problems for your 4.6.0 install.
  Upgrading to 4.7.2 or 4.8.0 should be done with more care.
 
  Thanks,
  Shawn
 
 
 
 
  --
  Hakim Benoudjit.
 
 
 
 
  --
  Hakim Benoudjit.




-- 
Hakim Benoudjit.


Re: Help to Understand a Solr Query

2014-05-05 Thread nativecoder
That answer helps a lot

Where would the OR clause be ? 

(Exact_Field1:samplestring1 *OR* Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 *OR* Exact_Field2:samplestring2) AND
(Field1:samplestring1 *OR* Field1:samplestring2) AND (Field2:samplestring1
*OR* Field2:samplestring2)

Please note that in my query it is an AND clause. I am trying to understand
where the AND fits in.

*query=samplestring1 AND samplestring2*
defType: edismax
queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7
fieldList: Column1, Column2 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134763.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Odd XSLT behavior

2014-05-05 Thread Chris Hostetter

Shot in the dark: perhaps you have a doc w/o a value in the description 
field, which means the xsl:variable's select doesn't match anything; which 
perhaps means that your XSLT engine then leaves the variable undefined.


: Solr 4.7.2 (and 4.6.1)
: Tomcat 7.0.52
: Java 1.7.0_45 (and _55)
: 
: I'm getting some really odd behavior with some XSLT documents.  I've been
: doing some upgrades to Java  Solr and I'm trying to narrow down where the
: problems are happening.
: 
: I have a few XSLT docs that I put into the conf/xslt directory for my
: indexes  I haven't changed the in a while, and they were working fine for a
: 3.X Solr, and seemed to work fine on an earlier 4.X release.
: 
: The problem is that sometimes I get an error saying that a field can't be
: found.   Here's a slice of the XSLT:
:   xsl:template match=doc
: xsl:variable name=id select=str[@name='id']/
: xsl:variable name=url select=str[@name='url']/
: xsl:variable name=title select=str[@name='title']/
: xsl:variable name=description select=str[@name='description']/
: 
: entry xmlns=http://www.w3.org/2005/Atom;
:   titlexsl:value-of select=str[@name='title']//title
:   link
: xsl:attribute name=hrefxsl:value-of select=str[@name='url']
: //xsl:attribute
:   /link
:   summary
: xsl:choose
:   xsl:when test=string-length($description) gt; 255
: xsl:value-of select=concat(substring($description, 1, 255),
: '...')/
:   /xsl:when
:   xsl:otherwise
: xsl:value-of select=$description/
:   /xsl:otherwise
: /xsl:choose
:/summary
:.
: /xsl:template
: 
:I get messages saying that it can't find the description variable.
: This was working perfectly well, but I can't seem to narrow down a specific
: change that caused this.
: 
: Caused by: javax.xml.transform.TransformerConfigurationException:
: solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description' is
: undefined.
: at
: 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964)
: at
: 
org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)
: 
: Has anyone run into a problem like this?  Thanks!
: 
: -- Chris
: 

-Hoss
http://www.lucidworks.com/


Re: Stored vs non-stored very large text fields

2014-05-05 Thread Jochen Barth
I'll found out that storing Documents as separate docs+id does not  
help either.

You must have an completely separate collection/core to get things work fast.

Kind regards,
Jochen


Zitat von Jochen Barth ba...@ub.uni-heidelberg.de:


Ok, https://wiki.apache.org/solr/SolrPerformanceFactors

states that: Retrieving the stored fields of a query result can be  
a significant expense. This cost is affected largely by the number  
of bytes stored per document--the higher byte count, the sparser the  
documents will be distributed on disk and more I/O is necessary to  
retrieve the fields (usually this is a concern when storing large  
fields, like the entire contents of a document).


But in my case (with docValues=true) there should be no reason to  
access *.fdt.


Kind regards,
Jochen

Zitat von Jochen Barth ba...@ub.uni-heidelberg.de:


Something is really strange here:

even when configuring fields id + sort_... to docValues=true --  
so there's nothing to get from stored documents file --  
performance is still terrible with ocr stored=true _even_ with my  
patch which stores uncompressed like solr4.0.0 (checked with  
strings -a on *.fdt).


Just reading  
http://lucene.472066.n3.nabble.com/Can-Solr-handle-large-text-files-td3439504.html .. perhaps things will clear up soon (will check if spltting to index+non-stored and non-indexed+stored could help  
here)



Kind regards,
J. Barth


Zitat von Shawn Heisey s...@elyograg.org:


On 4/29/2014 4:20 AM, Jochen Barth wrote:

BTW: stored field compression:
are all stored fields within a document are put into one  
compressed chunk,

or by per-field basis?


Here's the issue that added the compression to Lucene:

https://issues.apache.org/jira/browse/LUCENE-4226

It was made the default stored field format for Lucene, which also made
it the default for Solr.  At this time, there is no way to remove
compression on Solr without writing custom code.  I filed an issue to
make it configurable, but I don't know how to do it.  Nobody else has
offered a solution either.  One day I might find some time to take a
look at the issue and see if I can solve it myself.

https://issues.apache.org/jira/browse/SOLR-4375

Here's the author's blog post that goes into more detail than the LUCENE
issue:

http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene

Thanks,
Shawn





Re: can't make GET request to solr in android app

2014-05-05 Thread blach
Thank you Shawn 

I did what you told me. now this is my code:


import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServer;
//import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.impl.*;

import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;


import java.io.InputStream;
@Override
public void onClick(View v) {
// TODO Auto-generated method stub
//etxt2.setText(etxt1.getText());
  
  //ALERT MESSAGE
 // Toast.makeText(getBaseContext(),Please wait, connecting to
server.,Toast.LENGTH_LONG).show(); 
SolrServer solr;
String urlString = 
http://localhost:8983/solr/collection1;;
solr = new HttpSolrServer(urlString);

SolrQuery query = new SolrQuery();
query.set(qt, /select);
query.set(q, mem);
  
   QueryResponse response = null;

try {
response = 
solr.query(query);
SolrDocumentList 
results = response.getResults();
for (int i = 0; i  
results.size(); ++i) {
  
//System.out.println(results.get(i));

etxt2.setText((CharSequence) results.get(i));
}
} catch (SolrServerException e) 
{
// TODO Auto-generated 
catch block
e.printStackTrace();
}
  }});   }




it gives me error that org.apache.solr.client.solrj is not found 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134769.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can't make GET request to solr in android app

2014-05-05 Thread Shawn Heisey
On 5/5/2014 12:17 PM, blach wrote:
 Thank you Shawn 

 I did what you told me. now this is my code:

snip

 it gives me error that org.apache.solr.client.solrj is not found 

I don't know how to do classpath management in the Android enviroment. 
You'll need to add the solrj jar to your application classpath.  In the
download that I have extracted on my computer, this is named
dist/solr-solrj-4.7.2.jar ... the version number is usually in the
filename.  A number of other jars are also required.  You can find these
in the dist/solrj-lib directory.  If you need a newer or slightly older
version of one of the dependent jars for your own code, it is usually OK
to use a slightly different version.

Thanks,
Shawn



Re: interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/

2014-05-05 Thread Chris Hostetter
: Hi everybody
:   can anyone give me a suitable interpretation for cat_rank in
: http://people.apache.org/~hossman/ac2012eu/ slide 15

Have you seen the video?  

http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630

That slide starts ~ 23:00 and i go through a description of this example.

TL;DW: cat_rank in this example would be a numeric ranking of the category 
the product is in - so cat_rank==N means the product is in the Nth most 
popular categoy on the site (so lower is better, but hte number is always 
a positive integer)




-Hoss
http://www.lucidworks.com/


Re: Help to Understand a Solr Query

2014-05-05 Thread nativecoder
That answer helps a lot

Where would the OR clause be ?

(Exact_Field1:samplestring1 OR Exact_Field1:samplestring2) AND
(Exact_Field2:samplestring1 OR Exact_Field2:samplestring2) AND
(Field1:samplestring1 OR Field1:samplestring2) AND (Field2:samplestring1
OR Field2:samplestring2)

Please note that in my query it is an AND clause. I am trying to understand
where the AND fits in. To be more precise my query is as below

q=samplestring1 AND samplestring2defType: edismaxqf: Exact_Field1^1.0
Exact_Field2^0.9 Field1^0.8 Field2^0.7fl= Column1, Column2 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Odd XSLT behavior

2014-05-05 Thread Christopher Gross
Checked that first -- it's a test site with a small sample size.  The field
is set in all of the items.  And refreshing the query a few times can yield
either result (with/without the error).

I'm reverting back to an old version of my stack (my code, plus tomcat 
solr), I'll step through my previous work slowly to see if I can pinpoint
what breaks it.  If I can (ever) determine what caused it then I'll post it.

Thanks!

-- Chris


On Mon, May 5, 2014 at 2:05 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 Shot in the dark: perhaps you have a doc w/o a value in the description
 field, which means the xsl:variable's select doesn't match anything; which
 perhaps means that your XSLT engine then leaves the variable undefined.


 : Solr 4.7.2 (and 4.6.1)
 : Tomcat 7.0.52
 : Java 1.7.0_45 (and _55)
 :
 : I'm getting some really odd behavior with some XSLT documents.  I've been
 : doing some upgrades to Java  Solr and I'm trying to narrow down where
 the
 : problems are happening.
 :
 : I have a few XSLT docs that I put into the conf/xslt directory for my
 : indexes  I haven't changed the in a while, and they were working fine
 for a
 : 3.X Solr, and seemed to work fine on an earlier 4.X release.
 :
 : The problem is that sometimes I get an error saying that a field can't be
 : found.   Here's a slice of the XSLT:
 :   xsl:template match=doc
 : xsl:variable name=id select=str[@name='id']/
 : xsl:variable name=url select=str[@name='url']/
 : xsl:variable name=title select=str[@name='title']/
 : xsl:variable name=description select=str[@name='description']/
 :
 : entry xmlns=http://www.w3.org/2005/Atom;
 :   titlexsl:value-of select=str[@name='title']//title
 :   link
 : xsl:attribute name=hrefxsl:value-of
 select=str[@name='url']
 : //xsl:attribute
 :   /link
 :   summary
 : xsl:choose
 :   xsl:when test=string-length($description) gt; 255
 : xsl:value-of select=concat(substring($description, 1, 255),
 : '...')/
 :   /xsl:when
 :   xsl:otherwise
 : xsl:value-of select=$description/
 :   /xsl:otherwise
 : /xsl:choose
 :/summary
 :.
 : /xsl:template
 :
 :I get messages saying that it can't find the description variable.
 : This was working perfectly well, but I can't seem to narrow down a
 specific
 : change that caused this.
 :
 : Caused by: javax.xml.transform.TransformerConfigurationException:
 : solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description'
 is
 : undefined.
 : at
 :
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964)
 : at
 :
 org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110)
 :
 : Has anyone run into a problem like this?  Thanks!
 :
 : -- Chris
 :

 -Hoss
 http://www.lucidworks.com/



Re: can't make GET request to solr in android app

2014-05-05 Thread blach
I have included the reference for this library in good way but still giving
me the same error.

feeling 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core failure when a lot of processes are indexing

2014-05-05 Thread Erick Erickson
Take a look through the article I linked, 5 minutes may be an issue
since the transaction log will hold all 5 minutes worth of input. In
batch processes this can be quite a bit of data. Worse, when a Solr
instance terminates unexpectedly, the entire transaction log can be
replayed.

Consider setting your autommit max time to something much shorter, say
30 seconds. Or even less. NOTE openSearcher should be false.

Then set your soft commit time to the latency you can stand, i.e. if
the users don't need to be able to search for a long time you can set
this to hours.

FWIW,
Erick

On Mon, May 5, 2014 at 11:03 AM, Hakim Benoudjit h.benoud...@gmail.com wrote:
 I've tried it  it worked by letting solr do the commit instead of my solr
 client.
 In solrconfig.xml:
 autocommit max_time has been set to 5 minutes  autosoftcommit max_time to
 something bigger.

 Thanks a lot guys!


 2014-05-05 16:30 GMT+01:00 Erick Erickson erickerick...@gmail.com:

 You should not be committing from the client by and large, use the
 autoCommit and autoSoftCommit options in solrconfig.xml.

 See:
 http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

 Best,
 Erick

 On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit h.benoud...@gmail.com
 wrote:
  Is there an option in Solr (solrconfig.xml or somewhere else) to
 regularize
  commits to the index.
  I meant to do a 'sleep' between each commit to the index, when data
  to-be-indexed is waiting inside a stack.
 
 
  2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com:
 
  The index is made with the same version of solr, that is searching
  (4.6.0), the config file (solrconfig.xml)  schema.xml is the same too.
  The only way for me to solve this issue is to let only one process to
  index at the same time. Wouldnt a layer of message queue resolve this
 issue?
 
 
  2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org:
 
  On 5/4/2014 9:30 AM, Hakim Benoudjit wrote:
   Ok. These files contain what you've requested:
  
   First (the xml error): http://pastebin.com/ZcagK3T7
   Second (java params): http://pastebin.com/JtWQpp6s
   Third (Solr version): http://pastebin.com/wYdpdsAW
 
  Are you running with an index originally built by an earlier version of
  Solr?  If you are, you may be running into a known bug.  The last
  caused by section of the java stacktrace looks similar to the one in
  this issue -- which is indeed index corruption:
 
  https://issues.apache.org/jira/browse/LUCENE-5377
 
  If that's the problem you're experiencing, upgrading your Solr version
  will hopefully fix it.  Simply dropping in the 4.6.1 war file and any
  contrib jars should cause zero problems for your 4.6.0 install.
  Upgrading to 4.7.2 or 4.8.0 should be done with more care.
 
  Thanks,
  Shawn
 
 
 
 
  --
  Hakim Benoudjit.
 
 
 
 
  --
  Hakim Benoudjit.




 --
 Hakim Benoudjit.


Turning on KeywordRepeat and RemoveDups on an existing fieldType.

2014-05-05 Thread Michael Tracey
As per the stemming docs ( 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I want 
to score the original term higher than the stemmed version by adding:

   filter class=solr.KeywordRepeatFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/

to a field type that is already created (with Stemming). I have 100M documents 
in this index, and it gets slowly reindexed every month as records change.  My 
question is, can I add this to the existing fieldType, or do I need to make a 
new fieldType, and copyField the data over to it, and after it's all reindexed 
switch my code?  I'd rather be able to just add the lines to my fieldType 
because I don't think I have enough disk space on my cloud members to hold my 
primary fulltext field twice.

Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod looks 
like this:

fieldType name=keywordText class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=keyword_stopwords.txt enablePositionIncrements=true /
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=keyword_stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
/fieldType

Thanks,

M.


Re: Error initializing QueryElevationComponent

2014-05-05 Thread Chris Hostetter

The full details are farther down in the stack...

: null:org.apache.solr.common.SolrException: SolrCore 'master' is not
: available due to init failure: Error initializing QueryElevationComponent.
...
: Caused by: org.apache.solr.common.SolrException: Error initializing
: QueryElevationComponent.
...
: Caused by: org.apache.solr.common.SolrException:
: org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; lineNumber:
: 28; columnNumber: 80; The reference to entity ver must end with the ';'
: delimiter.

The problem is that your elevate.xml is not a valid XML file at all -- you 
have a bare  character in there (as part of your id which is not 
valid in XML -- you are confusing hte parser into thinking that you intend 
for ver to be an XML entity but you are missing the ; at the end (and 
even if you had that, then you'd get an error that the entity ver; is 
not defined) ...

: id=sitecore://master/{137f5eb3-eb84-4165-bef0-5be1fbbc3201}?lang=enver=1/


you need to use valid XML, so that id attribute should be something 
like...

id=sitecore://master/{137f5eb3-eb84-4165-bef0-5be1fbbc3201}?lang=enamp;ver=1


-Hoss
http://www.lucidworks.com/


Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print to 
the page.

Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Relevancy help

2014-05-05 Thread Ravi Solr
Hello,
I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar


Re: Turning on KeywordRepeat and RemoveDups on an existing fieldType.

2014-05-05 Thread Jack Krupansky
I haven't personally used this technique, but I gather that the intent is 
that the unstemmed term will have a lower term frequency (more unique) than 
the stemmed term which may generate the same stemmed term from a number of 
different source terms.


To answer your question, no, you don't need a separate field or type for 
this feature, but it will tend to generate a lot more terms in your index 
since it will index a stemmed term as two terms.


Only use the repeat/remove filters for the index analyzer.

You will need to reindex to see the full effect immediately, but you can do 
the reindex incrementally (as you replace existing documents) as well if you 
don't mind if the difference in relevancy takes an extended time to become 
apparent.


-- Jack Krupansky

-Original Message- 
From: Michael Tracey

Sent: Monday, May 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Turning on KeywordRepeat and RemoveDups on an existing fieldType.

As per the stemming docs ( 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I 
want to score the original term higher than the stemmed version by adding:


  filter class=solr.KeywordRepeatFilterFactory/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/

to a field type that is already created (with Stemming). I have 100M 
documents in this index, and it gets slowly reindexed every month as records 
change.  My question is, can I add this to the existing fieldType, or do I 
need to make a new fieldType, and copyField the data over to it, and after 
it's all reindexed switch my code?  I'd rather be able to just add the lines 
to my fieldType because I don't think I have enough disk space on my cloud 
members to hold my primary fulltext field twice.


Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod 
looks like this:


   fieldType name=keywordText class=solr.TextField 
positionIncrementGap=100

 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
words=keyword_stopwords.txt enablePositionIncrements=true /
   filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/

 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true 
words=keyword_stopwords.txt enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/

 /analyzer
   /fieldType

Thanks,

M. 



Re: Relevancy help

2014-05-05 Thread Jack Krupansky
The recip function query is the proper way to boost by reverse chronological 
order, but you may have to play around with the boost factor so that date 
does not completely overwhelm the natural relevancy.


Use the debugQuery=true parameter and look at the explain section to see 
what the document scores look like.


-- Jack Krupansky

-Original Message- 
From: Ravi Solr

Sent: Monday, May 5, 2014 5:41 PM
To: solr-user@lucene.apache.org
Subject: Relevancy help

Hello,
   I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar 



Re: Relevancy help

2014-05-05 Thread Ahmet Arslan
Hi Ravi,

Regarding recency please see : 
http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr

Regarding docs containing all words there is function query that elevates 
those docs to top. Search existing mailing list past posts.

Ahmet


On Tuesday, May 6, 2014 12:42 AM, Ravi Solr ravis...@gmail.com wrote:

Hello,
        I have a weird relevancy requirement. We search news content hence
chronology is very important and also relevancy, although both are mutually
exclusive. For example, if the search terms are -  malaysia airline crash
blackbox - my requirements are as follows

docs containing all words should be on top, but the editorial also wants
them sorted reverse by chronological order without loosing relevancy. Why
?? If on day 1 there is an article about search for blackbox but on day 2
the blackbox is found and day 3 there is an article about blackbox being
unusable...from the user's standpoint it makes sense that we show most
recent content on top.

I already boost recency of docs with
boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
3 months

However when I do the boost the chronology is messed up. I know relevancy
and sorting are mutually exclusive concepts. Is there any magic that we can
do in SOLR which can achieve both ???


Thanks,

Ravi Kiran bhaskar


Re: Strict Search in Apache Solr

2014-05-05 Thread Ahmet Arslan
Hi Reyes,

I think it is not clear your question. 
Please see : https://wiki.apache.org/solr/UsingMailingLists

Ahmet

On Tuesday, May 6, 2014 12:23 AM, Reyes, Mark mark.re...@bpiedu.com wrote:
How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print to 
the page.

Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.


Histogram facet?

2014-05-05 Thread Romain
Hi,

I am trying to plot a non date field by time in order to draw an histogram
showing its evolution during the week.

For example, if I have a tweet index:

Tweet:
  date
  retweetCount

3 tweets indexed:
Tweet | Date | Retweet
A01/01   100
B01/01   100
C01/02   100

If I want to plot the number of tweets by day: easy with a date range facet:
Day 1: 2
Day 2: 1

But now counting the number of retweet by day is not possible natively:
Day 1: 200
Day 2: 100

On current workaround would be to do a date rage facet to get the date
slots and ask only for the retweet field and compute the sums in the
client. We could compute other stats like average, etc... too

The closest I could see was
https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
slightly different.

Basically I am trying to do something very similar to the Date Histogram
Facethttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facetin
ES.

Is there a way to move the counting logic to the Solr server?

Thanks!

Romain


Re: Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
Okay, let¹s try it this wayŠ

CURRENTLY:
Step 1: Type, your future into the search bar.
Step 2: 10 search results return.

I¹D LIKE TO SEE THIS:
Step 1: Type, ³your future² into the search bar.
Step 2: 1 search result returns.

Can this be accomplished through the Solr UI?

Thanks,

Mark

On 5/5/14, 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote:

Hi Reyes,

I think it is not clear your question.
Please see : https://wiki.apache.org/solr/UsingMailingLists

Ahmet

On Tuesday, May 6, 2014 12:23 AM, Reyes, Mark mark.re...@bpiedu.com
wrote:
How could Solr accomplish an end-user behavior like a strict search?

Let¹s say an end-user decides to use quotation marks in their keywords to
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print
to the page.

Expected:
I¹d like to query: ³your future², then less than 10 results would return
and print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by
persons entitled to receive the confidential information it may contain.
E-mail messages sent from Bridgepoint Education may contain information
that is confidential and may be legally privileged. Please do not read,
copy, forward or store this message unless you are an intended recipient
of it. If you received this transmission in error, please notify the
sender by reply e-mail and delete the message and any attachments. 


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Strict Search in Apache Solr

2014-05-05 Thread Jack Krupansky
The term strict search is not in the Lucene/Solr nomenclature - it could 
mean any number of things.


It sounds as if maybe you want to do a phrase search, looking for an exact 
phrase - yes, you can do that by enclosing the phrase in quotes.


-- Jack Krupansky

-Original Message- 
From: Reyes, Mark

Sent: Monday, May 5, 2014 5:23 PM
To: solr-user@lucene.apache.org
Subject: Strict Search in Apache Solr

How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.


Current:
If you were to query: your future, then 10 results would return and print to 
the page.


Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.


Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. 
E-mail messages sent from Bridgepoint Education may contain information that 
is confidential and may be legally privileged. Please do not read, copy, 
forward or store this message unless you are an intended recipient of it. If 
you received this transmission in error, please notify the sender by reply 
e-mail and delete the message and any attachments. 



Linking Two Fields Together

2014-05-05 Thread Steve Edwards
I'm using Sorl to create an image search functionality that allows users to 
search for an existing image in the site to add to new content.  A given piece 
of content has a field that can store multiple images, so I will need to use a 
multi-value Solr field to store image data. Currently, I'm storing the path and 
file name in a tom_* field, since I want to be able to search on file name. 
However, another piece of data that I need to store and retrieve is the file id 
used to identify the file in the database (in the same table as the image 
path). What is the best way to store this data so that the file id and path 
values are properly synced, since there can be multiple images for each piece 
of content?  I could just store the file path/name (I need that data to be 
searchable, so it has to be stored in Solr), and then query the db for the fid 
once I get the results back, but I'd rather not do that if I don't have to.

Searching around, it doesn't appear that I can store multiple pieces of data in 
one field without doing some sort of concatenation and then splitting at query 
time.  If I just use two separate fields in each document, is it safe to assume 
that the values will be synchronized in the search results? In other words, if 
I put two values each into tom_image_path and im_image_file_id, when I query 
and the document is returned, can I assume the values in the two fields are 
synchronized?

Or, is there a way to store multiple pieces of data in one field so that they 
can be indexed together and then retrived together?

Thanks.

Steve

Re: dynamic field assignments

2014-05-05 Thread Chris Hostetter

: My understanding is that DynamicField can do something like
: FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have
: FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2.  Both of those
: field names need to map to a field type of 'fullText'.

I'm pretty sure you can get what you are after with the new Manged Schema 
functionality...

https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig

Assuming you have managed schema enabled in solrconfig.xml, and you define 
both of your fieldTypes using names like text and select then 
something like this should work in your processor chain... 

 processor class=solr.AddSchemaFieldsUpdateProcessorFactory
   str name=fieldRegex.*_TEXT_.*/str
   str name=defaultFieldTypetext/str
 /processor
 processor class=solr.AddSchemaFieldsUpdateProcessorFactory
   str name=fieldRegex.*_SELECT_.*/str
   str name=defaultFieldTypeselect/str
 /processor


(Normally that processor is used once with multiple value-type mappings 
-- but in your case you don't care about the run-time value, just the run 
time field name regex (which should also be configurable according 
to the various FieldNameSelector rules...

https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html
https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html


-Hoss
http://www.lucidworks.com/


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
my scheme.xml:
schema name=example core one version=1.1
  types
   fieldtype name=string  class=solr.StrField sortMissingLast=true
omitNorms=true/
   fieldType name=long class=solr.TrieLongField precisionStep=0
positionIncrementGap=0/
   fieldType name=uuid class=solr.UUIDField indexed=true /
   fieldtype name=textComplex class=solr.TextField
positionIncrementGap=100 omitNorms=false
autoGeneratePhraseQueries=false
   analyzer type=query
tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=false expand=true/
/analyzer
analyzer type=index
tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=false expand=true/
/analyzer
  /fieldtype
  /types
  
 fields   
  field name=idtype=uuidindexed=true 
 stored=true 
multiValued=false required=true /
  field name=name  type=textComplexindexed=true 
stored=true  multiValued=false /
  field name=type  type=stringindexed=true  stored=true 
multiValued=false /
  field name=price type=longindexed=true  stored=true 
/
 
  field name=_version_ type=long  indexed=true  stored=true/
 /fields
 
 uniqueKeyid/uniqueKey

 
 defaultSearchFieldname/defaultSearchField

 
 solrQueryParser defaultOperator=OR/
/schema

update docs:
docs: [
  {
name: 苹果4s,
type: 手机,
price: 2000,
id: 4017e35a-6b19-45b6-b945-382340ca1eec,
_version_: 1466799722505175000
  },
  {
name: 苹果5,
type: 手机,
price: 5000,
id: 4052d9f3-f6d9-458f-8bb0-477b17852f37,
_version_: 1466799735745544200
  },
  {
name: 三星,
type: 手机,
price: 3000,
id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac,
_version_: 1466799747596550100
  },
  {
name: 摩托罗拉i3,
type: 电脑,
price: 1000,
id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd,
_version_: 1466799757491961900
  },
  {
name: 摩托罗拉i5,
type: 电脑,
price: 1500,
id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c,
_version_: 1466799766311534600
  }
]
thank you , Erick,
i want to sort groups based on the sum of documents' scores within each
group, as you said, solr excels at getting the score of single documents, in
solr 4.6, the default sort of group each other depends on the maxScore of
all documents within each group, but the sum of documents' scores, though i
can get the sum of documents' scores by the client program, it's not good
idea, l know that the stats component of solr can statistics the long field,
so I had the idea to use statistic data for score field, but the score is
pse-udo field, the stats.field doesn't support it. In addition, as
scheme.xml displayed,  i do group on the elements of a string field(type)
without using participle.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Anybody uses Solr JMX?

2014-05-05 Thread Otis Gospodnetic
Alexandre, you could use something like
http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly
dump everything out of JMX and see if there is anything there Solr Admin UI
doesn't expose.  I think you'll find there is more in JMX than Solr Admin
UI shows.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch arafa...@gmail.comwrote:

 Thank you everybody for the links and explanations.

 I am still curious whether JMX exposes more details than the Admin UI?
 I am thinking of a troubleshooting context, rather than long-term
 monitoring one.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty g...@mimirtech.com wrote:
  On May 5, 2014 7:09 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
  I have religiously kept jmx statement in my solrconfig.xml, thinking
  it was enabling the web interface statistics output.
 
  But looking at the server logs really closely, I can see that JMX is
  actually disabled without server present. And the Admin UI does not
  actually seem to care after a quick test.
 
  Does anybody have a real experience with Solr JMX? Does it expose more
  information than Admin UI's Plugins/Stats page? Is it good for
 
 
  Have not been using JMX lately, but we were using it in the past. It does
  allow monitoring many useful details. As others have commented, it also
  integrates well with other monitoring  tools as JMX is a standard.
 
  Regards,
  Gora



Re: Anybody uses Solr JMX?

2014-05-05 Thread Alexandre Rafalovitch
Thanks Otis,

JMXC looks interesting, though I cannot seem to find the Open Source
section on your website it used to link to.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 9:43 AM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Alexandre, you could use something like
 http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly
 dump everything out of JMX and see if there is anything there Solr Admin UI
 doesn't expose.  I think you'll find there is more in JMX than Solr Admin
 UI shows.

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/


 On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch 
 arafa...@gmail.comwrote:

 Thank you everybody for the links and explanations.

 I am still curious whether JMX exposes more details than the Admin UI?
 I am thinking of a troubleshooting context, rather than long-term
 monitoring one.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty g...@mimirtech.com wrote:
  On May 5, 2014 7:09 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
  I have religiously kept jmx statement in my solrconfig.xml, thinking
  it was enabling the web interface statistics output.
 
  But looking at the server logs really closely, I can see that JMX is
  actually disabled without server present. And the Admin UI does not
  actually seem to care after a quick test.
 
  Does anybody have a real experience with Solr JMX? Does it expose more
  information than Admin UI's Plugins/Stats page? Is it good for
 
 
  Have not been using JMX lately, but we were using it in the past. It does
  allow monitoring many useful details. As others have commented, it also
  integrates well with other monitoring  tools as JMX is a standard.
 
  Regards,
  Gora



Re: Help to Understand a Solr Query

2014-05-05 Thread Alexandre Rafalovitch
If you are looking for that level of understanding, you are best
enabling the debug flag. Then you will get a full breakdown of what
matched which field and why. Including scores, preferences, etc.
Possibly with debug.explained.structured enabled:
http://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured

Most people do not want to deep dive into debug info. But I am getting
the feeling this would be right where you want to go.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 1:47 AM, nativecoder romrom...@gmail.com wrote:
 That answer helps a lot

 Where would the OR clause be ?

 (Exact_Field1:samplestring1 OR Exact_Field1:samplestring2) AND
 (Exact_Field2:samplestring1 OR Exact_Field2:samplestring2) AND
 (Field1:samplestring1 OR Field1:samplestring2) AND (Field2:samplestring1
 OR Field2:samplestring2)

 Please note that in my query it is an AND clause. I am trying to understand
 where the AND fits in. To be more precise my query is as below

 q=samplestring1 AND samplestring2defType: edismaxqf: Exact_Field1^1.0
 Exact_Field2^0.9 Field1^0.8 Field2^0.7fl= Column1, Column2




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134775.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Linking Two Fields Together

2014-05-05 Thread Alexandre Rafalovitch
You can have two parallel multi-value fields and as long as you don't
introduce null/empty values, they will kept together. However, for
recent Solr (4.7? certainly 4.8), you may want to look at parent/child
entries and join/parent/child queries.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 7:20 AM, Steve Edwards killsho...@gmail.com wrote:
 I'm using Sorl to create an image search functionality that allows users to 
 search for an existing image in the site to add to new content.  A given 
 piece of content has a field that can store multiple images, so I will need 
 to use a multi-value Solr field to store image data. Currently, I'm storing 
 the path and file name in a tom_* field, since I want to be able to search on 
 file name. However, another piece of data that I need to store and retrieve 
 is the file id used to identify the file in the database (in the same table 
 as the image path). What is the best way to store this data so that the file 
 id and path values are properly synced, since there can be multiple images 
 for each piece of content?  I could just store the file path/name (I need 
 that data to be searchable, so it has to be stored in Solr), and then query 
 the db for the fid once I get the results back, but I'd rather not do that if 
 I don't have to.

 Searching around, it doesn't appear that I can store multiple pieces of data 
 in one field without doing some sort of concatenation and then splitting at 
 query time.  If I just use two separate fields in each document, is it safe 
 to assume that the values will be synchronized in the search results? In 
 other words, if I put two values each into tom_image_path and 
 im_image_file_id, when I query and the document is returned, can I assume the 
 values in the two fields are synchronized?

 Or, is there a way to store multiple pieces of data in one field so that they 
 can be indexed together and then retrived together?

 Thanks.

 Steve


Re: Solr does not recognize language

2014-05-05 Thread Frankcis
hi,iorixxx, i'm Frankcis, not Victor , are you make the wrong email?


2014-05-05 23:20 GMT+08:00 iorixxx [via Lucene] 
ml-node+s472066n4134713...@n3.nabble.com:

 Hi Victor,

 I don't know mysolr, I assume you are using /update/json, lets add your
 chain to defaults section.

   requestHandler name=/update/json class=solr.UpdateRequestHandler

 lst name=defaults
  str name=stream.contentTypeapplication/json/str
  str name=update.chainlangid/str
/lst
   /requestHandler




 On Monday, May 5, 2014 4:06 PM, Victor Pascual [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4134713i=0
 wrote:
 Hi there,

 I'm indexing my documents using mysolr. I mainly generate a lost of json
 objects and the run: solr.update(documents_array,'json')



 On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4134713i=1
 wrote:

  Hi Victor,
 
  How do you index your documents? Your last config looks correct. However
  for example if you use data import handler you need to add update.chain
  there too. Same as extraction request hadler if you are using sole-cell.
 
  requestHandler name=/dataimport
  class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
str name=config/home/username/data-config.xml/str
str name=update.chainlangid/str
  /lst
/requestHandler
 
  By the way The URL
  http://localhost:8080/solr/update?commit=trueupdate.chain=langid was
  just an example and meant to feed xml update messages by POST method.
 Not
  to use in a browser.
 
  Ahmet
 
  On Monday, May 5, 2014 11:04 AM, Victor Pascual 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=2
 wrote:
 
  Thank you very much for you help Ahmet.
 
  However the language detection is still not workin. :(
  My solrconfig.xml didn't contain that lst section inside the update
  requestHandler.
  That's the content I added:
 
requestHandler name=/update
class=solr.XmlUpdateRequestHandler
 lst name=defaults
   str name=update.chainlangid/str
 /lst
  /requestHandler
  
 
 updateRequestProcessorChain name=langid
 processor
 
 class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory

lst name=defaults
  str name=langid.fltext/str
  str name=langid.langFieldlang/str
/lst
  /processor
  processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
  Now, your suggested query
  http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns

 
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime14/int
  /lst
  /response
  And there is still no lang field in my documents.
  Any idea what am I doing wrong?
 
 
 
 
  On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4134713i=3
 wrote:
 
  Hi,
  
  solr/update should be used, not /solr/select
  
  curl '
 http://localhost:8983/solr/update?commit=trueupdate.chain=langid'
  
  By the way don't you have following definition in your solrconfig.xml?
  
   requestHandler name=/update class=solr.UpdateRequestHandler
  
 lst name=defaults
   str name=update.chainlangid/str
 /lst
/requestHandler
  
  
  
  
  On Tuesday, April 29, 2014 4:50 PM, Victor Pascual 
  [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=4
 wrote:
  Hi Ahmet,
  
  thanks for your reply. Adding update.chain=langid to my query doesn't
  work: IP:8080/solr/select/?q=*%3A*update.chain=langid
  Regarding defining the chain in an UpdateRequestHandler... sorry for
 the
  lame question but shall I paste those three lines to solrconfig.xml, or
  shall I add them somewhere else?
  
  There is not UpdateRequestHandler in my solrconfig.
  
  Thanks!
  
  
  
  On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=4134713i=5
 wrote:
  
   Hi,
  
   Did you attach your chain to a UpdateRequestHandler?
  
   You can do it by adding update.chain=langid to the URL or defining
 it
  in
   a defaults section as follows
  
   lst name=defaults
str name=update.chainlangid/str
  /lst
  
  
  
   On Tuesday, April 29, 2014 3:18 PM, Victor Pascual 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=6
 wrote:
   Dear all,
  
   I'm a new user of Solr. I've managed to index a bunch of documents
 (in
   fact, they are tweets) and everything works quite smoothly.
  
   Nevertheless it looks like Solr doesn't detect the language of my
  documents
   nor remove stopwords accordingly so I can extract the most frequent
  terms.
  
   I've added this piece of XML to my solrconfig.xml as well as the Tika
  lib
   jars.
  
   updateRequestProcessorChain name=langid
  processor
  
  
 
 

Re: Strict Search in Apache Solr

2014-05-05 Thread Alexandre Rafalovitch
You can do phrase search explicitly with quotes. Or you could look at
something like Term query parser:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser

You can also enable autoGeneratePhraseQueries on the field type to try
the phrase queries, but that's in addition to trying individual terms:
https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 5:35 AM, Jack Krupansky j...@basetechnology.com wrote:
 The term strict search is not in the Lucene/Solr nomenclature - it could
 mean any number of things.

 It sounds as if maybe you want to do a phrase search, looking for an exact
 phrase - yes, you can do that by enclosing the phrase in quotes.

 -- Jack Krupansky

 -Original Message- From: Reyes, Mark
 Sent: Monday, May 5, 2014 5:23 PM
 To: solr-user@lucene.apache.org
 Subject: Strict Search in Apache Solr


 How could Solr accomplish an end-user behavior like a strict search?

 Let’s say an end-user decides to use quotation marks in their keywords to
 provide specificity in their search results.

 Current:
 If you were to query: your future, then 10 results would return and print to
 the page.

 Expected:
 I’d like to query: “your future”, then less than 10 results would return and
 print to the page.

 Regards,
 Mark

 IMPORTANT NOTICE: This e-mail message is intended to be received only by
 persons entitled to receive the confidential information it may contain.
 E-mail messages sent from Bridgepoint Education may contain information that
 is confidential and may be legally privileged. Please do not read, copy,
 forward or store this message unless you are an intended recipient of it. If
 you received this transmission in error, please notify the sender by reply
 e-mail and delete the message and any attachments.


Re: Relevancy help

2014-05-05 Thread Alexandre Rafalovitch
Can you sort by score, than date? Assuming similar articles will get
same score (may need to discount frequency/length).

There is also QueryRescore API introduced in Lucene 4.8 that might be
relevant. Though I have no idea how that would get exposed in Solr.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 5:12 AM, Ahmet Arslan iori...@yahoo.com wrote:
 Hi Ravi,

 Regarding recency please see : 
 http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr

 Regarding docs containing all words there is function query that elevates 
 those docs to top. Search existing mailing list past posts.

 Ahmet


 On Tuesday, May 6, 2014 12:42 AM, Ravi Solr ravis...@gmail.com wrote:

 Hello,
 I have a weird relevancy requirement. We search news content hence
 chronology is very important and also relevancy, although both are mutually
 exclusive. For example, if the search terms are -  malaysia airline crash
 blackbox - my requirements are as follows

 docs containing all words should be on top, but the editorial also wants
 them sorted reverse by chronological order without loosing relevancy. Why
 ?? If on day 1 there is an article about search for blackbox but on day 2
 the blackbox is found and day 3 there is an article about blackbox being
 unusable...from the user's standpoint it makes sense that we show most
 recent content on top.

 I already boost recency of docs with
 boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of
 3 months

 However when I do the boost the chronology is messed up. I know relevancy
 and sorting are mutually exclusive concepts. Is there any magic that we can
 do in SOLR which can achieve both ???


 Thanks,

 Ravi Kiran bhaskar


Re: Histogram facet?

2014-05-05 Thread Erick Erickson
Hmmm, I _think_ pivot faceting works here. One dimension would be day
and the other retweet count. The response will have the number of
retweets per day, you'd have to sum them up I suppose.

Best,
Erick

On Mon, May 5, 2014 at 3:18 PM, Romain romain@gmail.com wrote:
 Hi,

 I am trying to plot a non date field by time in order to draw an histogram
 showing its evolution during the week.

 For example, if I have a tweet index:

 Tweet:
   date
   retweetCount

 3 tweets indexed:
 Tweet | Date | Retweet
 A01/01   100
 B01/01   100
 C01/02   100

 If I want to plot the number of tweets by day: easy with a date range facet:
 Day 1: 2
 Day 2: 1

 But now counting the number of retweet by day is not possible natively:
 Day 1: 200
 Day 2: 100

 On current workaround would be to do a date rage facet to get the date
 slots and ask only for the retweet field and compute the sums in the
 client. We could compute other stats like average, etc... too

 The closest I could see was
 https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
 slightly different.

 Basically I am trying to do something very similar to the Date Histogram
 Facethttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facetin
 ES.

 Is there a way to move the counting logic to the Solr server?

 Thanks!

 Romain


Re: Wildcard malfunctioning

2014-05-05 Thread Alexandre Rafalovitch
I mark all the filters that support wildcards with (multi) on my list:
http://www.solr-start.com/info/analyzers/ . I uses actual interface
markers to derive that list, so it should be most up to date.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Mon, May 5, 2014 at 6:19 PM, Jack Krupansky j...@basetechnology.com wrote:
 Generally, stemming filters are not supported when wildcards are present.
 Only a small subset of filters work with wildcards, such as the case
 conversion filters.

 But, you stay that you are using the stemmer to remove diacritical marks...
 you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory.

 -- Jack Krupansky

 -Original Message- From: Román González
 Sent: Monday, May 5, 2014 7:00 AM
 To: solr-user@lucene.apache.org
 Subject: Wildcard malfunctioning


 Hi all!



 Sorry in advance if this question was posted but I were unable to find it
 with search engines.



 Filter SpanishLightStemFilterFactory is not working properly with wildcards
 or I’m misunderstanding something. I have the field



   field name=cultivo_es type=text_es indexed=true stored=true /



 With this type:



fieldType name=text_es class=solr.TextField
 positionIncrementGap=100

  analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_es.txt format=snowball /

filter class=solr.SpanishLightStemFilterFactory/

!-- more aggressive: filter
 class=solr.SnowballPorterFilterFactory language=Spanish/ --

  /analyzer

/fieldType



 But I’m getting these results:



 q = cultivo_es:uva

 Getting 50 correct results



 q = cultivo_es:uva*

 Getting the same 50 correct results



 q = cultivo_es:naranja

 Getting the 50 correct results of “naranja”



 q = cultivo_es:naranja*

 Getting the 0 results !



 It works fine if I remove SpanishLightStemFilterFactory filter, but I need
 it in order to filter diacritics according to Spanish rules.



 Thank you!!





Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Erick Erickson
You haven't answered _why_ this is a good idea. I'm having a hard
time understanding what would be _useful_ about sorting this way. Just
because the sum of scores in a group is greater than the sum of scores
in another says _nothing_ about how relevant any of the docs in the group
are relative to each other.

I mean group 1 could have 10M documents all with a score of .01 and group
2 could have 1 document with a score of 1,000 and group 1 would sort
first.

So unless you have some unusual use-case which you haven't yet articulated,
this seems like a bad idea.

Best,
Erick

On Mon, May 5, 2014 at 7:20 PM, Frankcis finalxc...@gmail.com wrote:
 my scheme.xml:
 schema name=example core one version=1.1
   types
fieldtype name=string  class=solr.StrField sortMissingLast=true
 omitNorms=true/
fieldType name=long class=solr.TrieLongField precisionStep=0
 positionIncrementGap=0/
fieldType name=uuid class=solr.UUIDField indexed=true /
fieldtype name=textComplex class=solr.TextField
 positionIncrementGap=100 omitNorms=false
 autoGeneratePhraseQueries=false
analyzer type=query
 tokenizer 
 class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
 mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=false expand=true/
 /analyzer
 analyzer type=index
 tokenizer 
 class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory
 mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=false expand=true/
 /analyzer
   /fieldtype
   /types

  fields
   field name=idtype=uuid
 indexed=true  stored=true
 multiValued=false required=true /
   field name=name  type=textComplexindexed=true
 stored=true  multiValued=false /
   field name=type  type=stringindexed=true  stored=true
 multiValued=false /
   field name=price type=longindexed=true  
 stored=true /

   field name=_version_ type=long  indexed=true  stored=true/
  /fields

  uniqueKeyid/uniqueKey


  defaultSearchFieldname/defaultSearchField


  solrQueryParser defaultOperator=OR/
 /schema

 update docs:
 docs: [
   {
 name: 苹果4s,
 type: 手机,
 price: 2000,
 id: 4017e35a-6b19-45b6-b945-382340ca1eec,
 _version_: 1466799722505175000
   },
   {
 name: 苹果5,
 type: 手机,
 price: 5000,
 id: 4052d9f3-f6d9-458f-8bb0-477b17852f37,
 _version_: 1466799735745544200
   },
   {
 name: 三星,
 type: 手机,
 price: 3000,
 id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac,
 _version_: 1466799747596550100
   },
   {
 name: 摩托罗拉i3,
 type: 电脑,
 price: 1000,
 id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd,
 _version_: 1466799757491961900
   },
   {
 name: 摩托罗拉i5,
 type: 电脑,
 price: 1500,
 id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c,
 _version_: 1466799766311534600
   }
 ]
 thank you , Erick,
 i want to sort groups based on the sum of documents' scores within each
 group, as you said, solr excels at getting the score of single documents, in
 solr 4.6, the default sort of group each other depends on the maxScore of
 all documents within each group, but the sum of documents' scores, though i
 can get the sum of documents' scores by the client program, it's not good
 idea, l know that the stats component of solr can statistics the long field,
 so I had the idea to use statistic data for score field, but the score is
 pse-udo field, the stats.field doesn't support it. In addition, as
 scheme.xml displayed,  i do group on the elements of a string field(type)
 without using participle.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're right, the maxScore of document within each group is
more effective than the sum of scores in a group, especially some use-case
just as your assumption(group 1 could have 10M documents all with a score of
.01 and group 2 could have 1 document with a score of 1,000 and group 1
would sort 
first) ,but the function is required by the client, can you tell me the way
how to achieve it ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing scanned PDFs

2014-05-05 Thread Chandan Tamrakar
​we are using SOLr to index pdf documents but there are cases where PDFs
are usually a scanned document  with no text to extract and index .

Is there a plugin or module in SOLR that we can integrate so that it would
actually extract a text / OCR and then index?


Thanks in advance

Chandan Tamrakar


Re: sort groups by the sum of the scores of the documents within each group

2014-05-05 Thread Frankcis
thank you, Erick, you're good man,
this is the client requirement:
In the forum, there is a lot of discussion of the content under different
subjects, search for a keyword,
which will lead to a result that the word of content or subject match the
query, group these document based on every subject, sort these groups based
on the sum score of every subject.

my pleasure to listen your suggestions.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing scanned PDFs

2014-05-05 Thread Alexandre Rafalovitch
Nothing I am aware of for Solr directly. You may have better luck
chasing this at TIKA mailing list, as that's what Solr uses under
covers to index PDF otherwise. Doing a quick search for Tika and OCR
brings up a number of links.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, May 6, 2014 at 12:15 PM, Chandan Tamrakar
chandan.tamra...@nepasoft.com wrote:
 we are using SOLr to index pdf documents but there are cases where PDFs
 are usually a scanned document  with no text to extract and index .

 Is there a plugin or module in SOLR that we can integrate so that it would
 actually extract a text / OCR and then index?


 Thanks in advance

 Chandan Tamrakar


Re: Histogram facet?

2014-05-05 Thread Romain Rigaux
The dates won't match unless you truncate all of them to day. But then if
you want to have slots of 15minutes it won't work as you would need to
truncate the dates every 15minutes in the index.

In ES, they have 1 field to make the slots and 1 field to insert into the
bucket, e.g.:

{
query : {


match_all : {}


},
facets : {


histo1 : {


date_histogram : {


key_field : timestamp,


value_field : price,


interval : day


}
}


}
}

Romain


On Mon, May 5, 2014 at 9:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I _think_ pivot faceting works here. One dimension would be day
 and the other retweet count. The response will have the number of
 retweets per day, you'd have to sum them up I suppose.

 Best,
 Erick

 On Mon, May 5, 2014 at 3:18 PM, Romain romain@gmail.com wrote:
  Hi,
 
  I am trying to plot a non date field by time in order to draw an
 histogram
  showing its evolution during the week.
 
  For example, if I have a tweet index:
 
  Tweet:
date
retweetCount
 
  3 tweets indexed:
  Tweet | Date | Retweet
  A01/01   100
  B01/01   100
  C01/02   100
 
  If I want to plot the number of tweets by day: easy with a date range
 facet:
  Day 1: 2
  Day 2: 1
 
  But now counting the number of retweet by day is not possible natively:
  Day 1: 200
  Day 2: 100
 
  On current workaround would be to do a date rage facet to get the date
  slots and ask only for the retweet field and compute the sums in the
  client. We could compute other stats like average, etc... too
 
  The closest I could see was
  https://issues.apache.org/jira/browse/SOLR-4772but it seems to be
  slightly different.
 
  Basically I am trying to do something very similar to the Date Histogram
  Facet
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facet
 in
  ES.
 
  Is there a way to move the counting logic to the Solr server?
 
  Thanks!
 
  Romain