Re: Number of fields in schema.xml and impact on Solr

2015-04-22 Thread Steven White
Thanks Shawn.  This is good to know.

Steve

On Wed, Apr 22, 2015 at 9:00 AM, Shawn Heisey elyog...@elyograg.org wrote:

 On 4/22/2015 6:33 AM, Steven White wrote:
  Is there anything I should be taking into consideration if I have a large
  number of fields in my Solr's schema.xml file?
 
  I will be indexing records into Solr and as I create documents, each
  document will have between 20-200 fields.  However, due to the natural of
  my data source, the combined flatten list of fields that I need to
 include
  in schema.xml will be upward of 2000 and may reach 3000.
 
  My questions are as follows, compare a schema with 300 fields vs. 3000:
 
  1) Will indexing be slower?  Require more memory?  CPU?
  2) Will the index size be larger?  If so any idea by what factor?
  3) Will searches be slower?  Require more memory?  CPU?
  4) Will the field type (float, boolean, date, string, etc.) have any
  factor?
  5) Anything else I should know that I didn't ask?
 
  I should make it clear that only about 5 fields will be of type stored
  while everything else will be indexed.

 The number of fields in your schema is likely not a significant
 contributor to performance.  I'm sure it can have an impact because
 there is code that validates everything against the schema, but even
 with a few thousand entries, that code should execute quickly.  The
 amount of data you are actually indexing is MUCH more relevant.

 The Lucene index itself is only aware of the fields that actually
 contain data.  The entire Solr schema is not used or recorded by Lucene
 code at all.  It is only used within code specific to Solr.

 Thanks,
 Shawn




Number of fields in schema.xml and impact on Solr

2015-04-22 Thread Steven White
Hi Everyone

Is there anything I should be taking into consideration if I have a large
number of fields in my Solr's schema.xml file?

I will be indexing records into Solr and as I create documents, each
document will have between 20-200 fields.  However, due to the natural of
my data source, the combined flatten list of fields that I need to include
in schema.xml will be upward of 2000 and may reach 3000.

My questions are as follows, compare a schema with 300 fields vs. 3000:

1) Will indexing be slower?  Require more memory?  CPU?
2) Will the index size be larger?  If so any idea by what factor?
3) Will searches be slower?  Require more memory?  CPU?
4) Will the field type (float, boolean, date, string, etc.) have any
factor?
5) Anything else I should know that I didn't ask?

I should make it clear that only about 5 fields will be of type stored
while everything else will be indexed.

Thanks

Steve


Re: Checking of Solr Memory and Disk usage

2015-04-22 Thread Zheng Lin Edwin Yeo
I see. I'm running on SolrCloud with 2 replicia, so I guess mine will
probably use much more when my system reaches millions of documents.

Regards,
Edwin


On 22 April 2015 at 20:47, Shawn Heisey apa...@elyograg.org wrote:

 On 4/22/2015 12:11 AM, Zheng Lin Edwin Yeo wrote:
  Roughly how many collections and how much records do you have in your
 Solr?
 
  I have 8 collections with a total of roughly 227000 records, most of
 which
  are CSV records. One of my collections have 142000 records.

 The core that shows 82MB for heap usage has 16 million documents and is
 hit with an average of 1 or 2 queries per second.  The entire Solr
 instance on this machine has about 55 million documents and a 6GB max heap.

 This is NOT running SolrCloud, though the indexes are distributed.
 There are 24 cores defined, but during normal operation, only four of
 them contain documents.  All four of those cores show heap memory values
 less than 100MB, but the overall heap usage on that machine is measured
 in gigabytes.

 Thanks,
 Shawn




Odp.: Suggester

2015-04-22 Thread LAFK
For the sake of others who would look for the solution and stumble upon this 
thread, consider sharing. 

I'd expect Solr to return whole field, if it's a text block then that's it. 

@LAFK_PL
  Oryginalna wiadomość  
Od: Martin Keller
Wysłano: środa, 22 kwietnia 2015 16:36
Do: solr-user@lucene.apache.org
Odpowiedz: solr-user@lucene.apache.org
Temat: Re: Suggester

OK, I found the problem and as so often it was sitting in front of the display. 

Now the next problem:
The suggestions returned consist always of a complete text block where the 
match was found. I would have expected a single word or a small phrase.

Thanks in advance
Martin


 Am 22.04.2015 um 12:50 schrieb Martin Keller martin.kel...@unitedplanet.com:
 
 Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t 
 change anything.
 The suggest dictionary is freshly built.
 As I mentioned before, only words or phrases of the source field „content“ 
 are not matched.
 When querying the index, the response only contains „suggestions“ field data 
 not coming from the „content“ field.
 The complete schema is a slightly modified techproducts schema.
 „Normal“ searching for words which I would expect coming from „content“ works.
 
 Any more ideas?
 
 Thanks 
 Martin
 
 
 Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com:
 
 Did you build your suggest dictionary after indexing? Kind of a shot in the
 dark but worth a try.
 
 Note that the suggest field of your suggester isn't using your text_suggest
 field type to make suggestions, it's using text_general. IOW, the text may
 not be analyzed as you expect.
 
 Best,
 Erick
 
 On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller
 martin.kel...@unitedplanet.com wrote:
 Hello together,
 
 I have some problems with the Solr 5.1.0 suggester.
 I followed the instructions in 
 https://cwiki.apache.org/confluence/display/solr/Suggester and also tried 
 the techproducts example delivered with the binary package, which is 
 working well.
 
 I added a field suggestions-Field to the schema:
 
 field name=suggestions type=text_suggest indexed=true stored=true 
 multiValued=true“/
 
 
 And added some copies to the field:
 
 copyField source=content dest=suggestions/
 copyField source=title dest=suggestions/
 copyField source=author dest=suggestions/
 copyField source=description dest=suggestions/
 copyField source=keywords dest=suggestions/
 
 
 The field type definition for „text_suggest“ is pretty simple:
 
 fieldType name=text_suggest class=solr.TextField 
 positionIncrementGap=100
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt /
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType
 
 
 I Also changed the solrconfig.xml to use the suggestions field:
 
 searchComponent class=solr.SuggestComponent name=suggest
 lst name=suggester
 str name=namemySuggester/str
 str name=lookupImplFuzzyLookupFactory/str
 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldsuggestions/str
 str name=suggestAnalyzerFieldTypetext_general/str
 str name=buildOnStartupfalse/str
 /lst
 /searchComponent
 
 
 For Tokens original coming from „title or „author“, I get suggestions, but 
 not any from the content field.
 So, what do I have to do?
 
 Any help is appreciated.
 
 
 Martin
 
 



Re: MLT causing Problems

2015-04-22 Thread Erick Erickson
Anything more informative in the Solr logs?

Best,
Erick



On Wed, Apr 22, 2015 at 2:45 AM, Srinivas Rishindra
sririshin...@gmail.com wrote:
 Hello,

 I am working on a project in which i have to find similar documents.
 While I implementing the following error is occurring. Please let me know
 what to do.

 Exception in thread main
 org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
 from server at http://localhost:8980/solr/rishi: Expected mime type
 application/octet-stream but got text/html. html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 404 Not Found/title
 /head
 bodyh2HTTP ERROR 404/h2
 pProblem accessing /solr/rishi/mlt. Reason:
 preNot Found/pre/phr /ismallPowered by
 Jetty:///small/ibr/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/

 /body
 /html

 at
 org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:525)
 at
 org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233)
 at
 org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225)
 at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
 at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
 at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)
 at MoreLikeThis.main(MoreLikeThis.java:31)


After language detection is enabled, SOLR (5.1) isn't indexing anything

2015-04-22 Thread Angel Todorov
Hi guys,

I've enabled language detection in solrconfig.xml:

  updateRequestProcessorChain name=langid

processor class=
org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory


  lst name=defaults

str name=langid.flcontent,title/str

str name=langid.fallbacken/str

str name=langid.langFieldlanguage_s/str

str name=langid.lcmapen_GB:en en_US:en/str

str name=langid.map.lcmapen_GB:en en_US:en/str

  /lst

/processor

  /updateRequestProcessorChain


Then I have:


  requestHandler name=/update class=solr.UpdateRequestHandler

!-- See below for information on defining

 updateRequestProcessorChains that can be used by name

 on each Update Request

  --



   lst name=defaults

 str name=update.chainlangid/str

   /lst



  /requestHandler


When I try to index a document, it's not added to the SOLR index. If I
remove the above code, everything works fine.


Do i need to make any specific changes to the schema.xml? Here is an
excerpt of it :


 field name=title type=string indexed=true stored=true required=
false multiValued=false /

field name=title_en type=string indexed=true stored=true required=
false multiValued=false /

field name=content type=multilang_text_exact indexed=true stored=
true/


fieldType name=multilang_text_exact class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true

analyzer type=index

tokenizer class=solr.LetterTokenizerFactory/

/analyzer

analyzer type=query

tokenizer class=solr.LetterTokenizerFactory/

/analyzer

 /fieldType


I don't get any errors in the SOLR console output.


Do i need to add _en and _LANG ID suffixes to all fields in my schema,
for the above to work? I mean, do i need to have title, title_en, title_jp,
and so on - manually defined in the schema? I still don't understand why a
document isn't added at all, without any error being thrown.


Thank you,

Angel


Re: Boolean filter query not working as expected

2015-04-22 Thread Jack Krupansky
A purely negative sub-query is not supported by Lucene - you need to have
at least one positive term, such as *:*, at each level of sub-query. Try:

((*:* -(field:V1) AND -(field:V2)) AND -(field:V3))

-- Jack Krupansky

On Wed, Apr 22, 2015 at 10:56 AM, Dhutia, Devansh ddhu...@gannett.com
wrote:

 I have an automated filter query builder that uses the SolrNet nuget
 package to build out boolean filters. I have a scenario where it is
 generating a fq in the following format:

 ((-(field:V1) AND -(field:V2)) AND -(field:V3))
 The filter looks legal to me (albeit with extra parentheses), but the
 above yields 0 total results, even though I know eligible data exists.

 If I manually re-write the above filter as

 (-(field:V1) AND -(field:V2) AND -(field:V3))
 I get the expected results.

 I realize the auto generated filter could be rewritten in a different way,
 but the question still remains, why is the first version not returning any
 results?

 Solr does not report any errors  returns successfully, just with 0
 results.

 Thanks



Re: Odp.: solr issue with pdf forms

2015-04-22 Thread Erick Erickson
Are they not _indexed_ correctly or not being displayed correctly?
Take a look at admin UIschema browser your field and press the
load terms button. That'll show you what is _in_ the index as
opposed to what the raw data looked like.

When you return the field in a Solr search, you get a verbatim,
un-analyzed copy of your original input. My guess is that your browser
isn't using the compatible character encoding for display.

Best,
Erick

On Wed, Apr 22, 2015 at 7:08 AM,  steve.sch...@t-systems.com wrote:
 Thanks for your answer. Maybe my English is not good enough, what are you 
 trying to say? Sorry I didn't get the point.
 :-(


 -Ursprüngliche Nachricht-
 Von: LAFK [mailto:tomasz.bo...@gmail.com]
 Gesendet: Mittwoch, 22. April 2015 14:01
 An: solr-user@lucene.apache.org; solr-user@lucene.apache.org
 Betreff: Odp.: solr issue with pdf forms

 Out of my head I'd follow how are writable PDFs created and encoded.

 @LAFK_PL
   Oryginalna wiadomość
 Od: steve.sch...@t-systems.com
 Wysłano: środa, 22 kwietnia 2015 12:41
 Do: solr-user@lucene.apache.org
 Odpowiedz: solr-user@lucene.apache.org
 Temat: solr issue with pdf forms

 Hi guys,

 hopefully you can help me with my issue. We are using a solr setup and have 
 the following issue:
 - usual pdf files are indexed just fine
 - pdf files with writable form-fields look like this:
 Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind

 Somehow the blank space character is not indexed correctly.

 Is this a know issue? Does anybody have an idea?

 Thanks a lot
 Best
 Steve


Re: Document Created Date

2015-04-22 Thread Eric Meisler
Sorry if my question was too vague.  In my mind it wasn't but you led me in the 
right direction which gave me a new issue.  
 
I added the following to my schema.xml to bring back the Created Date:  field 
name=created type=date indexed=false stored=true/ but now I am getting 
back the created date for PDF files but not for Word documents (specifically 
.doc and .docx).   
 
Has anyone run into this issue?  If I look at the properties for all three 
types of files the Create Date is called created so I am not sure what I am 
doing wrong.
 
Thanks for the help in advanced.
 
Eric
 


 Erick Erickson erickerick...@gmail.com 4/21/2015 11:45 AM 
Not really sure what you're asking here, I must be missing something.

The mapping is through the  field name supplied, so as long as your input
XML has something like
add
   doc
   field name=CreatedDateyour date here/field
   /doc
/add

it should be fine.

You can use date math here as well, as:
   field name=CreatedDateNOW/field

Best,
Erick

On Tue, Apr 21, 2015 at 7:57 AM, Eric Meisler
eric.meis...@veritablelp.com wrote:
 I am a newbie and just started using Solr 4.10.3.  We have successfully 
 indexed a network drive and are running searches.  We now have a request to 
 show the Created Date for all documents (PDF/WORD/TXT/XLS) that come back 
 in our search results.  I have successfully filtered on the last_modified 
 date but I cannot figure out or find out how to add a document's Created Date 
 to the schema.xml.  We do not want to search on the created date since 
 last_modified date handles this but just want to display it.  To my 
 understanding I need to add indexed=false and stored=true to the xml 
 field but I don't know how or understand how the xml name will map to the 
 document's created date property.

 This is my guess:
 field name=CreatedDate type=date indexed=false stored=true/

 Can someone please supply the correct syntax for the xml and maybe a brief 
 comment on how solr maps to the actual document's property?  Also, will I 
 need to re-index the dive to make this change apply?

 Thanks,
   Eric


Re: Number of fields in schema.xml and impact on Solr

2015-04-22 Thread Shawn Heisey
On 4/22/2015 6:33 AM, Steven White wrote:
 Is there anything I should be taking into consideration if I have a large
 number of fields in my Solr's schema.xml file?
 
 I will be indexing records into Solr and as I create documents, each
 document will have between 20-200 fields.  However, due to the natural of
 my data source, the combined flatten list of fields that I need to include
 in schema.xml will be upward of 2000 and may reach 3000.
 
 My questions are as follows, compare a schema with 300 fields vs. 3000:
 
 1) Will indexing be slower?  Require more memory?  CPU?
 2) Will the index size be larger?  If so any idea by what factor?
 3) Will searches be slower?  Require more memory?  CPU?
 4) Will the field type (float, boolean, date, string, etc.) have any
 factor?
 5) Anything else I should know that I didn't ask?
 
 I should make it clear that only about 5 fields will be of type stored
 while everything else will be indexed.

The number of fields in your schema is likely not a significant
contributor to performance.  I'm sure it can have an impact because
there is code that validates everything against the schema, but even
with a few thousand entries, that code should execute quickly.  The
amount of data you are actually indexing is MUCH more relevant.

The Lucene index itself is only aware of the fields that actually
contain data.  The entire Solr schema is not used or recorded by Lucene
code at all.  It is only used within code specific to Solr.

Thanks,
Shawn



Solr Error Message ShutDown

2015-04-22 Thread EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS)
Hi , We are having an issue without PROD environment and its say below message 
when we access solr using browser..


HTTP Status 503 - Server is shutting down or failed to initialize

type Status report
message Server is shutting down or failed to initialize
description The requested service is not currently available.

Apache Tomcat/7.0.59

Any Suggestions or similar will help US. Note: This happens after Microsoft 
Patch, The solr is in Windows environment (2012)


Thanks

Ravi





Re: Checking of Solr Memory and Disk usage

2015-04-22 Thread Shawn Heisey
On 4/22/2015 12:11 AM, Zheng Lin Edwin Yeo wrote:
 Roughly how many collections and how much records do you have in your Solr?
 
 I have 8 collections with a total of roughly 227000 records, most of which
 are CSV records. One of my collections have 142000 records.

The core that shows 82MB for heap usage has 16 million documents and is
hit with an average of 1 or 2 queries per second.  The entire Solr
instance on this machine has about 55 million documents and a 6GB max heap.

This is NOT running SolrCloud, though the indexes are distributed.
There are 24 cores defined, but during normal operation, only four of
them contain documents.  All four of those cores show heap memory values
less than 100MB, but the overall heap usage on that machine is measured
in gigabytes.

Thanks,
Shawn



Getting error while searching meaningless words

2015-04-22 Thread Eray Ince
Hello There,

We are using hybris with SOLR (4.6.1) I checked the
https://issues.apache.org/jira/browse/SOLR-6563 and saw that problem has
been solved. However we are still getting same problem on standalone
server. There is no problem on embedded server.
Is there any idea? You can find log file on below.

rg.springframework.web.util.NestedServletException: Request processing
failed; nested exception is org.apache.solr.common.SolrException:
org.apache.http.ParseException: Invalid content type:

org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:948)

org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:827)
javax.servlet.http.HttpServlet.service(HttpServlet.java:620)

org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:812)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)

de.hybris.platform.servicelayer.web.AbstractPlatformFilterChain$InternalFilterChain.doFilter(AbstractPlatformFilterChain.java:256)

de.hybris.platform.servicelayer.web.AbstractPlatformFilterChain$StatisticsGatewayFilter.doFilter(AbstractPlatformFilterChain.java:345)

de.hybris.platform.servicelayer.web.AbstractPlatformFilterChain$InternalFilterChain.doFilter(AbstractPlatformFilterChain.java:226)


Re: Odp.: phraseFreq vs sloppyFreq

2015-04-22 Thread Dmitry Kan
LAFK,

Yes, or even more, than 1k. Based on sloppyFreq component (hopefully, same
as phraseFreq) we get documents where keywords occur near each other ranked
higher. As if we used slop=10 or something.

On Wed, Apr 22, 2015 at 2:59 PM, LAFK tomasz.bo...@gmail.com wrote:

 Out of curiosity, why proximity 1k?

 @LAFK_PL
   Oryginalna wiadomość
 Od: Dmitry Kan
 Wysłano: środa, 22 kwietnia 2015 09:26
 Do: solr-user@lucene.apache.org
 Odpowiedz: solr-user@lucene.apache.org
 Temat: phraseFreq vs sloppyFreq

 Hi guys. I'm executing the following proximity query: leader the~1000. In
 the debugQuery I see phraseFreq=0.032258064. Is phraseFreq same thing as
 sloppyFreq from

 https://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html
 ?

 Do higher phraserFreq increase the final similarity score?

 --
 Dmitry Kan
 Luke Toolbox: http://github.com/DmitryKey/luke
 Blog: http://dmitrykan.blogspot.com
 Twitter: http://twitter.com/dmitrykan
 SemanticAnalyzer: www.semanticanalyzer.info




-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Highlighting in Solr

2015-04-22 Thread Zheng Lin Edwin Yeo
Hi,

I'm currently implementing highlighting on my Solr-5.0.0. When I issue the
following command:
http://localhost:8983/solr/collection1/select?q=conducted
http://localhost:8983/solr/edmtechnical/select?q=conducted
hl=truehl.fl=Content,Summarywt=jsonindent=truerows=10,
the highlighting result is listed at the bottom of the output, instead of
together with the rest of the response above. The result is shown below:

  response:{numFound:10,start:0,docs:[
  {
id:1-1,
Summary:i} Trial conducted,
Content:Completed,
_version_:1498407036159787020},


  highlighting:{
1-1:{
  Summary:[i) Trial emconducted/em]}


Is there any way to get the highlighted output to be displayed
together with the rest of the response, instead of having it display
separately at the bottom? Which is something like this


  response:{numFound:10,start:0,docs:[
  {
id:1-1,
Summary:i} Trial emconducted/em,
Content:Completed,
_version_:1498407036159787020},


Regards,
Edwin


Boolean filter query not working as expected

2015-04-22 Thread Dhutia, Devansh
I have an automated filter query builder that uses the SolrNet nuget package to 
build out boolean filters. I have a scenario where it is generating a fq in the 
following format:

((-(field:V1) AND -(field:V2)) AND -(field:V3))
The filter looks legal to me (albeit with extra parentheses), but the above 
yields 0 total results, even though I know eligible data exists.

If I manually re-write the above filter as

(-(field:V1) AND -(field:V2) AND -(field:V3))
I get the expected results.

I realize the auto generated filter could be rewritten in a different way, but 
the question still remains, why is the first version not returning any results?

Solr does not report any errors  returns successfully, just with 0 results.

Thanks


Re: Boolean filter query not working as expected

2015-04-22 Thread Dhutia, Devansh
If I upgrade to using the edismax parser in my fq, I get the desired 
results. 
The default lucene parser on fq must not be able to parse the more complex 
nested clauses

q=*:*fq={!type=edismax}((-(field:V1) AND -(field:V2)) AND -(field:V3)) - 
Works





On 4/22/15, 3:27 PM, Dhutia, Devansh ddhu...@gannett.com wrote:

I don’t know if that’s completely true, or maybe I’m misunderstanding 
something. 

If it doesn’t support purely negative subqueries, this shouldn't work, 
but 
does: 
q=*:*fq=(-(field:V1))

However, for me, the following is a summary of what works  what doesn’t. 
q=*:*fq=(-(field:V1))   - Works

q=*:*fq=((-(field:V1) AND -(field:V2)) AND -(field:V3)) - Doesn’t 
work
q=*:*fq=(-(field:V1) AND -(field:V2) AND -(field:V3))   - Works
q=*:*fq=((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) - Works




On 4/22/15, 3:02 PM, Jack Krupansky jack.krupan...@gmail.com wrote:

A purely negative sub-query is not supported by Lucene - you need to have
at least one positive term, such as *:*, at each level of sub-query. 
Try:

((*:* -(field:V1) AND -(field:V2)) AND -(field:V3))

-- Jack Krupansky

On Wed, Apr 22, 2015 at 10:56 AM, Dhutia, Devansh ddhu...@gannett.com
wrote:

 I have an automated filter query builder that uses the SolrNet nuget
 package to build out boolean filters. I have a scenario where it is
 generating a fq in the following format:

 ((-(field:V1) AND -(field:V2)) AND -(field:V3))
 The filter looks legal to me (albeit with extra parentheses), but the
 above yields 0 total results, even though I know eligible data exists.

 If I manually re-write the above filter as

 (-(field:V1) AND -(field:V2) AND -(field:V3))
 I get the expected results.

 I realize the auto generated filter could be rewritten in a different 
way,
 but the question still remains, why is the first version not returning 
any
 results?

 Solr does not report any errors  returns successfully, just with 0
 results.

 Thanks



no subject

2015-04-22 Thread Bill Tsay


On 4/22/15, 7:36 AM, Martin Keller martin.kel...@unitedplanet.com
wrote:

OK, I found the problem and as so often it was sitting in front of the
display. 

Now the next problem:
The suggestions returned consist always of a complete text block where
the match was found. I would have expected a single word or a small
phrase.

Thanks in advance
Martin


 Am 22.04.2015 um 12:50 schrieb Martin Keller
martin.kel...@unitedplanet.com:
 
 Unfortunately, setting suggestAnalyzerFieldType to text_suggest
didn’t change anything.
 The suggest dictionary is freshly built.
 As I mentioned before, only words or phrases of the source field
„content“ are not matched.
 When querying the index, the response only contains „suggestions“ field
data not coming from the „content“ field.
 The complete schema is a slightly modified techproducts schema.
 „Normal“ searching for words which I would expect coming from „content“
works.
 
 Any more ideas?
 
 Thanks 
 Martin
 
 
 Am 21.04.2015 um 17:39 schrieb Erick Erickson
erickerick...@gmail.com:
 
 Did you build your suggest dictionary after indexing? Kind of a shot
in the
 dark but worth a try.
 
 Note that the suggest field of your suggester isn't using your
text_suggest
 field type to make suggestions, it's using text_general. IOW, the
text may
 not be analyzed as you expect.
 
 Best,
 Erick
 
 On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller
 martin.kel...@unitedplanet.com wrote:
 Hello together,
 
 I have some problems with the Solr 5.1.0 suggester.
 I followed the instructions in
https://cwiki.apache.org/confluence/display/solr/Suggester and also
tried the techproducts example delivered with the binary package,
which is working well.
 
 I added a field suggestions-Field to the schema:
 
 field name=suggestions type=text_suggest indexed=true
stored=true multiValued=true“/
 
 
 And added some copies to the field:
 
 copyField source=content dest=suggestions/
 copyField source=title dest=suggestions/
 copyField source=author dest=suggestions/
 copyField source=description dest=suggestions/
 copyField source=keywords dest=suggestions/
 
 
 The field type definition for „text_suggest“ is pretty simple:
 
 fieldType name=text_suggest class=solr.TextField
positionIncrementGap=100
   analyzer
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
   filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 
 
 I Also changed the solrconfig.xml to use the suggestions field:
 
 searchComponent class=solr.SuggestComponent name=suggest
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldsuggestions/str
   str name=suggestAnalyzerFieldTypetext_general/str
   str name=buildOnStartupfalse/str
 /lst
 /searchComponent
 
 
 For Tokens original coming from „title or „author“, I get
suggestions, but not any from the content field.
 So, what do I have to do?
 
 Any help is appreciated.
 
 
 Martin
 
 




Re: Bad contentType for search handler :text/xml; charset=UTF-8

2015-04-22 Thread Walter Underwood
text/xml is not a safe content-type, because of the way that HTTP handles 
charsets. Always use application/xml.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Apr 22, 2015, at 3:01 AM, bengates benga...@aliceadsl.fr wrote:

 Looks like Solarium hardcodes a default header Content-Type: text/xml;
 charset=utf-8 if none provided.
 Removing it solves the problem.
 
 It seems that Solr 5.1 doesn't support this content-type.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201579.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Boolean filter query not working as expected

2015-04-22 Thread Dhutia, Devansh
I don’t know if that’s completely true, or maybe I’m misunderstanding 
something. 

If it doesn’t support purely negative subqueries, this shouldn't work, but 
does: 
q=*:*fq=(-(field:V1))

However, for me, the following is a summary of what works  what doesn’t. 
q=*:*fq=(-(field:V1))   - Works

q=*:*fq=((-(field:V1) AND -(field:V2)) AND -(field:V3)) - Doesn’t work
q=*:*fq=(-(field:V1) AND -(field:V2) AND -(field:V3))   - Works
q=*:*fq=((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) - Works




On 4/22/15, 3:02 PM, Jack Krupansky jack.krupan...@gmail.com wrote:

A purely negative sub-query is not supported by Lucene - you need to have
at least one positive term, such as *:*, at each level of sub-query. Try:

((*:* -(field:V1) AND -(field:V2)) AND -(field:V3))

-- Jack Krupansky

On Wed, Apr 22, 2015 at 10:56 AM, Dhutia, Devansh ddhu...@gannett.com
wrote:

 I have an automated filter query builder that uses the SolrNet nuget
 package to build out boolean filters. I have a scenario where it is
 generating a fq in the following format:

 ((-(field:V1) AND -(field:V2)) AND -(field:V3))
 The filter looks legal to me (albeit with extra parentheses), but the
 above yields 0 total results, even though I know eligible data exists.

 If I manually re-write the above filter as

 (-(field:V1) AND -(field:V2) AND -(field:V3))
 I get the expected results.

 I realize the auto generated filter could be rewritten in a different 
way,
 but the question still remains, why is the first version not returning 
any
 results?

 Solr does not report any errors  returns successfully, just with 0
 results.

 Thanks



Re: solr issue with pdf forms

2015-04-22 Thread Dan Davis
Steve,

Are you using ExtractingRequestHandler / DataImportHandler or extracting
the text content from the PDF outside of Solr?

On Wed, Apr 22, 2015 at 6:40 AM, steve.sch...@t-systems.com wrote:

 Hi guys,

 hopefully you can help me with my issue. We are using a solr setup and
 have the following issue:
 - usual pdf files are indexed just fine
 - pdf files with writable form-fields look like this:

 Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind

 Somehow the blank space character is not indexed correctly.

 Is this a know issue? Does anybody have an idea?

 Thanks a lot
 Best
 Steve



Re: Odp.: solr issue with pdf forms

2015-04-22 Thread Dan Davis
+1 - I like Erick's answer.  Let me know if that turns out to be the
problem - I'm interested in this problem and would be happy to help.

On Wed, Apr 22, 2015 at 11:11 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Are they not _indexed_ correctly or not being displayed correctly?
 Take a look at admin UIschema browser your field and press the
 load terms button. That'll show you what is _in_ the index as
 opposed to what the raw data looked like.

 When you return the field in a Solr search, you get a verbatim,
 un-analyzed copy of your original input. My guess is that your browser
 isn't using the compatible character encoding for display.

 Best,
 Erick

 On Wed, Apr 22, 2015 at 7:08 AM,  steve.sch...@t-systems.com wrote:
  Thanks for your answer. Maybe my English is not good enough, what are
 you trying to say? Sorry I didn't get the point.
  :-(
 
 
  -Ursprüngliche Nachricht-
  Von: LAFK [mailto:tomasz.bo...@gmail.com]
  Gesendet: Mittwoch, 22. April 2015 14:01
  An: solr-user@lucene.apache.org; solr-user@lucene.apache.org
  Betreff: Odp.: solr issue with pdf forms
 
  Out of my head I'd follow how are writable PDFs created and encoded.
 
  @LAFK_PL
Oryginalna wiadomość
  Od: steve.sch...@t-systems.com
  Wysłano: środa, 22 kwietnia 2015 12:41
  Do: solr-user@lucene.apache.org
  Odpowiedz: solr-user@lucene.apache.org
  Temat: solr issue with pdf forms
 
  Hi guys,
 
  hopefully you can help me with my issue. We are using a solr setup and
 have the following issue:
  - usual pdf files are indexed just fine
  - pdf files with writable form-fields look like this:
 
 Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind
 
  Somehow the blank space character is not indexed correctly.
 
  Is this a know issue? Does anybody have an idea?
 
  Thanks a lot
  Best
  Steve



Re: Suggester

2015-04-22 Thread Martin Keller
OK, I found the problem and as so often it was sitting in front of the display. 

Now the next problem:
The suggestions returned consist always of a complete text block where the 
match was found. I would have expected a single word or a small phrase.

Thanks in advance
Martin


 Am 22.04.2015 um 12:50 schrieb Martin Keller martin.kel...@unitedplanet.com:
 
 Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t 
 change anything.
 The suggest dictionary is freshly built.
 As I mentioned before, only words or phrases of the source field „content“ 
 are not matched.
 When querying the index, the response only contains „suggestions“ field data 
 not coming from the „content“ field.
 The complete schema is a slightly modified techproducts schema.
 „Normal“ searching for words which I would expect coming from „content“ works.
 
 Any more ideas?
 
 Thanks 
 Martin
 
 
 Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com:
 
 Did you build your suggest dictionary after indexing? Kind of a shot in the
 dark but worth a try.
 
 Note that the suggest field of your suggester isn't using your text_suggest
 field type to make suggestions, it's using text_general. IOW, the text may
 not be analyzed as you expect.
 
 Best,
 Erick
 
 On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller
 martin.kel...@unitedplanet.com wrote:
 Hello together,
 
 I have some problems with the Solr 5.1.0 suggester.
 I followed the instructions in 
 https://cwiki.apache.org/confluence/display/solr/Suggester and also tried 
 the techproducts example delivered with the binary package, which is 
 working well.
 
 I added a field suggestions-Field to the schema:
 
 field name=suggestions type=text_suggest indexed=true stored=true 
 multiValued=true“/
 
 
 And added some copies to the field:
 
 copyField source=content dest=suggestions/
 copyField source=title dest=suggestions/
 copyField source=author dest=suggestions/
 copyField source=description dest=suggestions/
 copyField source=keywords dest=suggestions/
 
 
 The field type definition for „text_suggest“ is pretty simple:
 
 fieldType name=text_suggest class=solr.TextField 
 positionIncrementGap=100
   analyzer
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt /
   filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 
 
 I Also changed the solrconfig.xml to use the suggestions field:
 
 searchComponent class=solr.SuggestComponent name=suggest
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldsuggestions/str
   str name=suggestAnalyzerFieldTypetext_general/str
   str name=buildOnStartupfalse/str
 /lst
 /searchComponent
 
 
 For Tokens original coming from „title or „author“, I get suggestions, but 
 not any from the content field.
 So, what do I have to do?
 
 Any help is appreciated.
 
 
 Martin
 
 



Re: Bad contentType for search handler :text/xml; charset=UTF-8

2015-04-22 Thread didier deshommes
A similar problem seems to happen when sending application/json to the
search handler. Solr returns a NullPointerException for some reason:

vagrant@precise64:~/solr-5.1.0$ curl 
http://localhost:8983/solr/gettingstarted/select?wt=jsonindent=trueq=foundation;
-H Content-type:application/json
{
  responseHeader:{
status:500,
QTime:2,
params:{
  indent:true,
  json:,
  q:foundation,
  wt:json}},
  error:{
trace:java.lang.NullPointerException\n\tat
org.apache.solr.request.json.ObjectUtil$ConflictHandler.mergeMap(ObjectUtil.java:60)\n\tat
org.apache.solr.request.json.ObjectUtil.mergeObjects(ObjectUtil.java:114)\n\tat
org.apache.solr.request.json.RequestUtil.mergeJSON(RequestUtil.java:259)\n\tat
org.apache.solr.request.json.RequestUtil.processParams(RequestUtil.java:176)\n\tat
org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginUtils.java:166)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:140)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
java.lang.Thread.run(Thread.java:745)\n,
code:500}}

On Wed, Apr 22, 2015 at 9:41 AM, Walter Underwood wun...@wunderwood.org
wrote:

 text/xml is not a safe content-type, because of the way that HTTP handles
 charsets. Always use application/xml.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


 On Apr 22, 2015, at 3:01 AM, bengates benga...@aliceadsl.fr wrote:

  Looks like Solarium hardcodes a default header Content-Type: text/xml;
  charset=utf-8 if none provided.
  Removing it solves the problem.
 
  It seems that Solr 5.1 doesn't support this content-type.
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201579.html
  Sent from the Solr - User mailing list archive at Nabble.com.




Re: Document Created Date

2015-04-22 Thread Erick Erickson
The generic problem with all the semi-structured documents is that
the meta-data has no consistent naming. Making up names here, but Word
might have created_on, PDF created etc. Its really frustrating,
but each type has to be investigated to figure out which field you
want to map to created. Tika and SolrCel just map what they find.

On way to go about this is to map the dynamic glob pattern to a stored
field, then look at what pops out. Not satisfactory, but...

dynamicField name=* type=string stored=true multiValued=true /

Best,
Erick

On Wed, Apr 22, 2015 at 5:44 AM, Eric Meisler
eric.meis...@veritablelp.com wrote:
 Sorry if my question was too vague.  In my mind it wasn't but you led me in 
 the right direction which gave me a new issue.

 I added the following to my schema.xml to bring back the Created Date:  
 field name=created type=date indexed=false stored=true/ but now I 
 am getting back the created date for PDF files but not for Word documents 
 (specifically .doc and .docx).

 Has anyone run into this issue?  If I look at the properties for all three 
 types of files the Create Date is called created so I am not sure what I am 
 doing wrong.

 Thanks for the help in advanced.

 Eric



 Erick Erickson erickerick...@gmail.com 4/21/2015 11:45 AM 
 Not really sure what you're asking here, I must be missing something.

 The mapping is through the  field name supplied, so as long as your input
 XML has something like
 add
doc
field name=CreatedDateyour date here/field
/doc
 /add

 it should be fine.

 You can use date math here as well, as:
field name=CreatedDateNOW/field

 Best,
 Erick

 On Tue, Apr 21, 2015 at 7:57 AM, Eric Meisler
 eric.meis...@veritablelp.com wrote:
 I am a newbie and just started using Solr 4.10.3.  We have successfully 
 indexed a network drive and are running searches.  We now have a request to 
 show the Created Date for all documents (PDF/WORD/TXT/XLS) that come back 
 in our search results.  I have successfully filtered on the last_modified 
 date but I cannot figure out or find out how to add a document's Created 
 Date to the schema.xml.  We do not want to search on the created date since 
 last_modified date handles this but just want to display it.  To my 
 understanding I need to add indexed=false and stored=true to the xml 
 field but I don't know how or understand how the xml name will map to the 
 document's created date property.

 This is my guess:
 field name=CreatedDate type=date indexed=false stored=true/

 Can someone please supply the correct syntax for the xml and maybe a brief 
 comment on how solr maps to the actual document's property?  Also, will I 
 need to re-index the dive to make this change apply?

 Thanks,
   Eric


Re: Solr Error Message ShutDown

2015-04-22 Thread Erick Erickson
What version of Solr? And do the Solr logs show anything useful? Or
catalina.out?

Best,
Erick

On Wed, Apr 22, 2015 at 7:23 AM, EXTERNAL Taminidi Ravi (ETI,
AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote:
 Hi , We are having an issue without PROD environment and its say below 
 message when we access solr using browser..


 HTTP Status 503 - Server is shutting down or failed to initialize
 
 type Status report
 message Server is shutting down or failed to initialize
 description The requested service is not currently available.
 
 Apache Tomcat/7.0.59

 Any Suggestions or similar will help US. Note: This happens after Microsoft 
 Patch, The solr is in Windows environment (2012)


 Thanks

 Ravi





Re: Odp.: Suggester

2015-04-22 Thread Erick Erickson
Right, this is what the suggester you're using is built for. Which is
actually way cool for certain situations.

Try the FreeTextLookupFactory (warning, I'm not too familiar with the
nuances here)

Or maybe spelling suggestions are more what you're looking for which
look at the terms and
return a term at a time.

Best,
Erick

On Wed, Apr 22, 2015 at 7:59 AM, LAFK tomasz.bo...@gmail.com wrote:
 For the sake of others who would look for the solution and stumble upon this 
 thread, consider sharing.

 I'd expect Solr to return whole field, if it's a text block then that's it.

 @LAFK_PL
   Oryginalna wiadomość
 Od: Martin Keller
 Wysłano: środa, 22 kwietnia 2015 16:36
 Do: solr-user@lucene.apache.org
 Odpowiedz: solr-user@lucene.apache.org
 Temat: Re: Suggester

 OK, I found the problem and as so often it was sitting in front of the 
 display.

 Now the next problem:
 The suggestions returned consist always of a complete text block where the 
 match was found. I would have expected a single word or a small phrase.

 Thanks in advance
 Martin


 Am 22.04.2015 um 12:50 schrieb Martin Keller 
 martin.kel...@unitedplanet.com:

 Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t 
 change anything.
 The suggest dictionary is freshly built.
 As I mentioned before, only words or phrases of the source field „content“ 
 are not matched.
 When querying the index, the response only contains „suggestions“ field data 
 not coming from the „content“ field.
 The complete schema is a slightly modified techproducts schema.
 „Normal“ searching for words which I would expect coming from „content“ 
 works.

 Any more ideas?

 Thanks
 Martin


 Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com:

 Did you build your suggest dictionary after indexing? Kind of a shot in the
 dark but worth a try.

 Note that the suggest field of your suggester isn't using your 
 text_suggest
 field type to make suggestions, it's using text_general. IOW, the text may
 not be analyzed as you expect.

 Best,
 Erick

 On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller
 martin.kel...@unitedplanet.com wrote:
 Hello together,

 I have some problems with the Solr 5.1.0 suggester.
 I followed the instructions in 
 https://cwiki.apache.org/confluence/display/solr/Suggester and also tried 
 the techproducts example delivered with the binary package, which is 
 working well.

 I added a field suggestions-Field to the schema:

 field name=suggestions type=text_suggest indexed=true stored=true 
 multiValued=true“/


 And added some copies to the field:

 copyField source=content dest=suggestions/
 copyField source=title dest=suggestions/
 copyField source=author dest=suggestions/
 copyField source=description dest=suggestions/
 copyField source=keywords dest=suggestions/


 The field type definition for „text_suggest“ is pretty simple:

 fieldType name=text_suggest class=solr.TextField 
 positionIncrementGap=100
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt /
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType


 I Also changed the solrconfig.xml to use the suggestions field:

 searchComponent class=solr.SuggestComponent name=suggest
 lst name=suggester
 str name=namemySuggester/str
 str name=lookupImplFuzzyLookupFactory/str
 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldsuggestions/str
 str name=suggestAnalyzerFieldTypetext_general/str
 str name=buildOnStartupfalse/str
 /lst
 /searchComponent


 For Tokens original coming from „title or „author“, I get suggestions, 
 but not any from the content field.
 So, what do I have to do?

 Any help is appreciated.


 Martin





Re: Checking of Solr Memory and Disk usage

2015-04-22 Thread Zheng Lin Edwin Yeo
Roughly how many collections and how much records do you have in your Solr?

I have 8 collections with a total of roughly 227000 records, most of which
are CSV records. One of my collections have 142000 records.

Regards,
Edwin


On 22 April 2015 at 13:49, Shawn Heisey apa...@elyograg.org wrote:

 On 4/21/2015 11:33 PM, Zheng Lin Edwin Yeo wrote:
  I've got the amount of disk space used, but for the Heap Memory Usage
  reading, it is showing the value -1.
  Do we need to change any settings for it? When I check from the Windows
  Task Manager, it is showing about 300MB for shard1 and 150MB for shard2.
  But I suppose that is the usage for the entire Solr and not for
 individual
  collection.

 That -1 sounds like a bug, but I'd like others to have a chance to chime
 in before you open an issue in Jira.

 My Solr instances are older -- 4.7.2 and 4.9.1.  One of the larger cores
 on a 4.7.2 server shows a heap memory value of 86656138 -- about 82MB.
 I have no way to verify, but this seems very low to me.

 Thanks,
 Shawn




Suggestion in Solr Cloud

2015-04-22 Thread Swaraj Kumar
Hi All,

I want to use suggest option in solr but my SOLR is in cloud mode hence to
get the suggestion every time in query I need to provide shard url with it
like below:-

http://node1/solr/city/suggest?suggest.dictionary=solr-suggestersuggest=truesuggest.build=truesuggest.q=Delhishards=node1/solr/city,node2/solr/cityshards.qt=/suggest

Here my requirement  is, if any ways where I get the same suggestion by not
providing shards in the query.



Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
Mob No- 9811774497


Re: Solr 4.10.x regression in map-reduce contrib

2015-04-22 Thread Shenghua(Daniel) Wan
I got same issue when using 4.10.2. I suspected this issue will cause
trouble when using too many reducers.
Then I tried to use less reducers, and made it work.
I do not think map-reduce contrib in this version is stable... Anyway it is
free.

On Tue, Apr 21, 2015 at 10:56 PM, ralph tice ralph.t...@gmail.com wrote:

 Hello list,

 I'm using mapreduce from contrib and I get this stack trace:
 https://gist.github.com/ralph-tice/b1e84bdeb64532c7ecab

 Whenever I specify luceneMatchVersion4.10/luceneMatchVersion in my
 solrconfig.xml.  4.9 works fine.  I'm using 4.10.4 artifacts for both map
 reduce runs.  I tried raising maxWarmingSearchers to 20 and set
 openSearcher to false in my configs with no difference.

 I have started studying the code, but why would BatchWriter invoke warming
 (autowarming?) on a close, let alone opening a new searcher?  Should I be
 looking in Lucene or Solr code to investigate this regression?  I also
 notice there are interesting defaults for FaultTolerance in SolrReducer
 that don't appear to be documented:

 https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/SolrReducer.java#L70-L73
 but reading https://issues.apache.org/jira/browse/SOLR-5758 sounds like
 they are either unimportant or overlooked?

 Also, we will probably be testing mapreduce contrib with 5.x, has anyone
 been successful with this yet or are there any known issues?  I don't see a
 lot of changes on contrib/map-reduce...

 Regards,

 --Ralph Tice
 ralph.t...@gmail.com




-- 

Regards,
Shenghua (Daniel) Wan


Re: Complete list of field type that Solr supports

2015-04-22 Thread Chris Hostetter

: To be clear, here is an example of a type from Solr's schema.xml:
: 
: field name=weight type=float indexed=true stored=true/
: 
: Here, the type is float.  I'm looking for the complete list of
: out-of-the-box types supported.

what you are asking about are just symbolic names that come from type/ 
definitions in the schema.xml -- there is no complete list.  you can add 
any arbitrary type name=foo .../ you want to your schema, and now 
you've introduced a new type that solr supports.

As far as the list of all FieldType *classes* that exist in solr out of 
the box (ie: the list of classes that can be specified in type/ 
declarations, that is a bit more straight forward...

https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr



-Hoss
http://www.lucidworks.com/


Re: Complete list of field type that Solr supports

2015-04-22 Thread Chris Hostetter

: I'm confused.  If type=float is just a symbolic name, how does Solr knows
: to index the data of field weight as float?  What about for date per
: this example:
: 
: field name=last_modified type=date indexed=true stored=true/
: 
: How does Solr applies date-range queries such as:

because somewhere else in your schema is a type/ declaration that 
defines type name=date ... using class=solr.TrieDateField

you asked for the complete list of all possible values for the type 
attribute on a field/ -- the answer is infinite because the possible 
values for the type attribute on a field/ is dictated by whatever you 
might choose to specify as the name attribute on a type/

: I was always under the impression that there are primitive field-types but
: looks like that's not the case?

There are FieldType *classes* which can be configured a variety of ways in 
your schema.xml, and then reused by different fields -- but the *names* of 
those types is up to you.

for example: the exact same TriDateField *class* can be configured in your 
schema.xml to implement 2 differnet *types* named date_foo and 
date_bar by using different default options (maybe one uses a 
non-default precisionStep and defaults to stored=true while the other 
uses the default precisionStep and defaults to stored=faluse) ... those 
two diff types can then both be used in your schema...

  field name=last_modified type=date_foo indexed=true /
  field name=pub_date type=date_bar indexed=true /

...and have different behavior.

: 
: Thanks
: 
: Steve
: 
: On Wed, Apr 22, 2015 at 12:59 PM, Chris Hostetter hossman_luc...@fucit.org
: wrote:
: 
: 
:  : To be clear, here is an example of a type from Solr's schema.xml:
:  :
:  : field name=weight type=float indexed=true stored=true/
:  :
:  : Here, the type is float.  I'm looking for the complete list of
:  : out-of-the-box types supported.
: 
:  what you are asking about are just symbolic names that come from type/
:  definitions in the schema.xml -- there is no complete list.  you can add
:  any arbitrary type name=foo .../ you want to your schema, and now
:  you've introduced a new type that solr supports.
: 
:  As far as the list of all FieldType *classes* that exist in solr out of
:  the box (ie: the list of classes that can be specified in type/
:  declarations, that is a bit more straight forward...
: 
: 
:  
https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr
: 
: 
: 
:  -Hoss
:  http://www.lucidworks.com/
: 
: 

-Hoss
http://www.lucidworks.com/


Complete list of field type that Solr supports

2015-04-22 Thread Steven White
Hi Everyone,

I Googled for this with no luck.

Where can I find a complete list of field type that Solr supports?  In
the sample scheam.xml that comes with Solr 5 and prior version, I am able
to compile a list such as boolean, float, string, etc. but I cannot
find a complete list documented somewhere.

To be clear, here is an example of a type from Solr's schema.xml:

field name=weight type=float indexed=true stored=true/

Here, the type is float.  I'm looking for the complete list of
out-of-the-box types supported.

Thanks

Steve


Re: Complete list of field type that Solr supports

2015-04-22 Thread Steven White
Hi Hoss,

I'm confused.  If type=float is just a symbolic name, how does Solr knows
to index the data of field weight as float?  What about for date per
this example:

field name=last_modified type=date indexed=true stored=true/

How does Solr applies date-range queries such as:

last_modified:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]

I was always under the impression that there are primitive field-types but
looks like that's not the case?

Thanks

Steve

On Wed, Apr 22, 2015 at 12:59 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : To be clear, here is an example of a type from Solr's schema.xml:
 :
 : field name=weight type=float indexed=true stored=true/
 :
 : Here, the type is float.  I'm looking for the complete list of
 : out-of-the-box types supported.

 what you are asking about are just symbolic names that come from type/
 definitions in the schema.xml -- there is no complete list.  you can add
 any arbitrary type name=foo .../ you want to your schema, and now
 you've introduced a new type that solr supports.

 As far as the list of all FieldType *classes* that exist in solr out of
 the box (ie: the list of classes that can be specified in type/
 declarations, that is a bit more straight forward...


 https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr



 -Hoss
 http://www.lucidworks.com/



Re: Boolean filter query not working as expected

2015-04-22 Thread Chris Hostetter

1) https://lucidworks.com/blog/why-not-and-or-and-not/

2) use debug=query to understand how your (filter) query is being parsed.


: Date: Wed, 22 Apr 2015 14:56:22 +
: From: Dhutia, Devansh ddhu...@gannett.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org solr-user@lucene.apache.org
: Subject: Boolean filter query not working as expected
: 
: I have an automated filter query builder that uses the SolrNet nuget package 
to build out boolean filters. I have a scenario where it is generating a fq in 
the following format:
: 
: ((-(field:V1) AND -(field:V2)) AND -(field:V3))
: The filter looks legal to me (albeit with extra parentheses), but the above 
yields 0 total results, even though I know eligible data exists.
: 
: If I manually re-write the above filter as
: 
: (-(field:V1) AND -(field:V2) AND -(field:V3))
: I get the expected results.
: 
: I realize the auto generated filter could be rewritten in a different way, 
but the question still remains, why is the first version not returning any 
results?
: 
: Solr does not report any errors  returns successfully, just with 0 results.
: 
: Thanks
: 

-Hoss
http://www.lucidworks.com/


Re: Complete list of field type that Solr supports

2015-04-22 Thread Steven White
I got it now.

I have to start from fieldType/ to create my field/ list.  If I want a
list of supported field-types (used in my schema.xml), I have to look at
the class attribute of fieldType/ to get that list.  The out-of-the-box
list of field-types is documented in the link you provided:
https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr

Thanks

Steve

On Wed, Apr 22, 2015 at 1:46 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : I'm confused.  If type=float is just a symbolic name, how does Solr
 knows
 : to index the data of field weight as float?  What about for date
 per
 : this example:
 :
 : field name=last_modified type=date indexed=true
 stored=true/
 :
 : How does Solr applies date-range queries such as:

 because somewhere else in your schema is a type/ declaration that
 defines type name=date ... using class=solr.TrieDateField

 you asked for the complete list of all possible values for the type
 attribute on a field/ -- the answer is infinite because the possible
 values for the type attribute on a field/ is dictated by whatever you
 might choose to specify as the name attribute on a type/

 : I was always under the impression that there are primitive field-types
 but
 : looks like that's not the case?

 There are FieldType *classes* which can be configured a variety of ways in
 your schema.xml, and then reused by different fields -- but the *names* of
 those types is up to you.

 for example: the exact same TriDateField *class* can be configured in your
 schema.xml to implement 2 differnet *types* named date_foo and
 date_bar by using different default options (maybe one uses a
 non-default precisionStep and defaults to stored=true while the other
 uses the default precisionStep and defaults to stored=faluse) ... those
 two diff types can then both be used in your schema...

   field name=last_modified type=date_foo indexed=true /
   field name=pub_date type=date_bar indexed=true /

 ...and have different behavior.

 :
 : Thanks
 :
 : Steve
 :
 : On Wed, Apr 22, 2015 at 12:59 PM, Chris Hostetter 
 hossman_luc...@fucit.org
 : wrote:
 :
 : 
 :  : To be clear, here is an example of a type from Solr's schema.xml:
 :  :
 :  : field name=weight type=float indexed=true stored=true/
 :  :
 :  : Here, the type is float.  I'm looking for the complete list of
 :  : out-of-the-box types supported.
 : 
 :  what you are asking about are just symbolic names that come from
 type/
 :  definitions in the schema.xml -- there is no complete list.  you can
 add
 :  any arbitrary type name=foo .../ you want to your schema, and now
 :  you've introduced a new type that solr supports.
 : 
 :  As far as the list of all FieldType *classes* that exist in solr out
 of
 :  the box (ie: the list of classes that can be specified in type/
 :  declarations, that is a bit more straight forward...
 : 
 : 
 : 
 https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr
 : 
 : 
 : 
 :  -Hoss
 :  http://www.lucidworks.com/
 : 
 :

 -Hoss
 http://www.lucidworks.com/



Re: rq breaks wildcard search?

2015-04-22 Thread Ryan Josal
Awesome thanks!  I was on 4.10.2

Ryan

 On Apr 22, 2015, at 16:44, Joel Bernstein joels...@gmail.com wrote:
 
 For your own implementation you'll need to implement the following methods:
 
 public Query rewrite(IndexReader reader) throws IOException
 public void extractTerms(SetTerm terms)
 
 You can review the 4.10.3 version of the ReRankQParserPlugin to see how it
 implements these methods.
 
 Joel Bernstein
 http://joelsolr.blogspot.com/
 
 On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein joels...@gmail.com wrote:
 
 Just confirmed that wildcard queries work with Re-Ranking following
 SOLR-6323.
 
 Joel Bernstein
 http://joelsolr.blogspot.com/
 
 On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein joels...@gmail.com
 wrote:
 
 This should be resolved in
 https://issues.apache.org/jira/browse/SOLR-6323.
 
 Solr 4.10.3
 
 Joel Bernstein
 http://joelsolr.blogspot.com/
 
 On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote:
 
 Using edismax, supplying a rq= param, like {!rerank ...} is causing an
 UnsupportedOperationException because the Query doesn't implement
 createWeight.  This is for WildcardQuery in particular.  From some
 preliminary debugging it looks like without rq, somehow the qf Queries
 might turn into ConstantScore instead of WildcardQuery.  I don't think
 this
 is related to the RankQuery implementation as my own subclass has the
 same
 issue.  Anyway the effect is that all q's containing ? or * return http
 500
 because I always have rq on.  Can anyone confirm if this is a bug?  I
 will
 log it in Jira if so.
 
 Also, does anyone know how I can work around it?  Specifically, can I
 disable edismax from making WildcardQueries?
 
 Ryan
 


Re: rq breaks wildcard search?

2015-04-22 Thread Joel Bernstein
Just confirmed that wildcard queries work with Re-Ranking following
SOLR-6323.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein joels...@gmail.com wrote:

 This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323
 .

 Solr 4.10.3

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote:

 Using edismax, supplying a rq= param, like {!rerank ...} is causing an
 UnsupportedOperationException because the Query doesn't implement
 createWeight.  This is for WildcardQuery in particular.  From some
 preliminary debugging it looks like without rq, somehow the qf Queries
 might turn into ConstantScore instead of WildcardQuery.  I don't think
 this
 is related to the RankQuery implementation as my own subclass has the same
 issue.  Anyway the effect is that all q's containing ? or * return http
 500
 because I always have rq on.  Can anyone confirm if this is a bug?  I will
 log it in Jira if so.

 Also, does anyone know how I can work around it?  Specifically, can I
 disable edismax from making WildcardQueries?

 Ryan





Re: rq breaks wildcard search?

2015-04-22 Thread Joel Bernstein
This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323.

Solr 4.10.3

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote:

 Using edismax, supplying a rq= param, like {!rerank ...} is causing an
 UnsupportedOperationException because the Query doesn't implement
 createWeight.  This is for WildcardQuery in particular.  From some
 preliminary debugging it looks like without rq, somehow the qf Queries
 might turn into ConstantScore instead of WildcardQuery.  I don't think this
 is related to the RankQuery implementation as my own subclass has the same
 issue.  Anyway the effect is that all q's containing ? or * return http 500
 because I always have rq on.  Can anyone confirm if this is a bug?  I will
 log it in Jira if so.

 Also, does anyone know how I can work around it?  Specifically, can I
 disable edismax from making WildcardQueries?

 Ryan



Re: rq breaks wildcard search?

2015-04-22 Thread Joel Bernstein
For your own implementation you'll need to implement the following methods:

public Query rewrite(IndexReader reader) throws IOException
public void extractTerms(SetTerm terms)

You can review the 4.10.3 version of the ReRankQParserPlugin to see how it
implements these methods.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein joels...@gmail.com wrote:

 Just confirmed that wildcard queries work with Re-Ranking following
 SOLR-6323.

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein joels...@gmail.com
 wrote:

 This should be resolved in
 https://issues.apache.org/jira/browse/SOLR-6323.

 Solr 4.10.3

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote:

 Using edismax, supplying a rq= param, like {!rerank ...} is causing an
 UnsupportedOperationException because the Query doesn't implement
 createWeight.  This is for WildcardQuery in particular.  From some
 preliminary debugging it looks like without rq, somehow the qf Queries
 might turn into ConstantScore instead of WildcardQuery.  I don't think
 this
 is related to the RankQuery implementation as my own subclass has the
 same
 issue.  Anyway the effect is that all q's containing ? or * return http
 500
 because I always have rq on.  Can anyone confirm if this is a bug?  I
 will
 log it in Jira if so.

 Also, does anyone know how I can work around it?  Specifically, can I
 disable edismax from making WildcardQueries?

 Ryan






RE: Solr Index data lost

2015-04-22 Thread Vijay Bhoomireddy
Just to close this thread – It looks like it’s working fine now. Not sure what 
mistake I had done last time. But now, the index data is still persistent on 
the pen drive even after server shutdown and restarting it on a different 
machine where the pen drive is plugged in. 

 

Thanks for all your help..

 

Regards

Vijay

 

From: Vijaya Narayana Reddy Bhoomi Reddy 
[mailto:vijaya.bhoomire...@whishworks.com] 
Sent: 21 April 2015 09:22
To: solr-user@lucene.apache.org
Subject: Re: Solr Index data lost

 

Shawn,

 

Yes, I had used java -jar start.jar. I haven't tried moving it to a local hard 
disk, as I wanted to work on two machines (work and home). So was using a pen 
drive as the index storage. Yesterday, I did the complete indexing and then 
unplugged the drive from work machine and connected to my personal laptop. Data 
folder didn't exist.

 

Erick, 

 

As per your earlier suggestion, I am using Tika and SolrJ to index the data 
(both binary and as well database content) and the same had been committed 
using the SolrJ UpdataRequest. I was able to see the data in the admin UI 
screen and even performed some searches on the index and it worked fine.




Thanks  Regards

Vijay

 

On 21 April 2015 at 00:42, Erick Erickson erickerick...@gmail.com 
mailto:erickerick...@gmail.com  wrote:

Did you commit before you unplugged the drive? Were you able to see
data in the admin UI _before_ you unplugged the drive?

Best,
Erick


On Mon, Apr 20, 2015 at 3:58 PM, Vijay Bhoomireddy
vijaya.bhoomire...@whishworks.com mailto:vijaya.bhoomire...@whishworks.com  
wrote:
 Shawn,

 I haven’t changed any DirectoryFactory setting in the solrconfig.xml  as I am 
 using in a local setup and using the default configurations.

 Device has been unmounted successfully (confirmed through windows message in 
 the lower right corner). I am using Solr-4.10.2. I simply run a Ctrl-C 
 command in the windows Command prompt to stop Solr, in the same window where 
 it was started earlier.

 Please correct me if something has been done not in the correct fashion.

 Thanks  Regards
 Vijay

 -Original Message-
 From: Shawn Heisey [mailto:apa...@elyograg.org mailto:apa...@elyograg.org ]
 Sent: 20 April 2015 22:34
 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org 
 Subject: Re: Solr Index data lost

 On 4/20/2015 2:55 PM, Vijay Bhoomireddy wrote:
 I have configured Solr example server on a pen drive. I have indexed
 some content. The data directory was under
 example/solr/collection1/data which is the default one. After
 indexing, I stopped the Solr server and unplugged the pen drive and
 reconnected the same. Now, when I navigate to the SolrAdmin UI, I cannot see 
 any data in the index.

 Any pointers please? In this case, though the installation was on a
 pen-drive, I think it shouldn't matter to Solr on where the data
 directory is. So I believe this data folder wiping has happened due to
 server shutdown. Will the data folder be wiped off if the server is
 restarted or stopped? How to save the index data between machine
 failures or planned maintenances?

 If you are using the default Directory implementation in your solrconfig.xml 
 (NRTCachingDirectoryFactory for 4.x and later, MMapDirectoryFactory for newer 
 3.x versions), then everything should be persisted correctly.

 Did you properly unmount/eject the removable volume before you unplugged it?  
 On a non-windows OS, you might also want to run the 'sync'
 command.  If you didn't do the unmount/eject, you can't be sure that the 
 filesystem was properly closed and fully up-to-date on the device.

 What version of Solr did you use and how exactly did you start Solr and the 
 example?  How did you stop Solr?

 Thanks,
 Shawn




 --
 The contents of this e-mail are confidential and for the exclusive use of
 the intended recipient. If you receive this e-mail in error please delete
 it from your system immediately and notify us either by e-mail or
 telephone. You should not copy, forward or otherwise disclose the content
 of the e-mail. The views expressed in this communication may not
 necessarily be the view held by WHISHWORKS.

 


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


TIKA OCR not working

2015-04-22 Thread trung.ht
Hi,

I want to use solr to index some scanned document, after settings solr
document with a two field content and filename, I tried to upload the
attached file, but it seems that the content of the file is only \n \n
\n.
But if I used the tesseract from command line I got the result correctly.

The log when solr receive my request:
---
INFO  - 2015-04-23 03:49:25.941;
org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
webapp=/solr path=/update/extract params={literal.groupid=2json.nl=flat
resource.name=phplNiPrsliteral.id
=4commit=trueextractOnly=falseliteral.historyid=4omitHeader=trueliteral.userid=3literal.createddate=2015-04-22T15:00:00Zfmap.content=contentwt=jsonliteral.filename=\\trunght\test\tesseract_3.png}


The document when I check on solr admin page:
-
{ groupid: 2, id: 4, historyid: 4, userid: 3, createddate:
2015-04-22T15:00:00Z, filename: trunght\\test\\tesseract_3.png, 
autocomplete_text: [ trunght\\test\\tesseract_3.png ], content: 
\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
\n \n \n \n \n \n \n \n \n \n \n \n , _version_: 1499213034586898400 }
---

Since I am a solr newbie I do not know where to look, can anyone give me an
advice for where to look for error or settings to make it work.
Thanks in advanced.

Trung.


AW: Odp.: solr issue with pdf forms

2015-04-22 Thread Steve.Scholl
Thanks for your answer. Maybe my English is not good enough, what are you 
trying to say? Sorry I didn't get the point. 
:-(


-Ursprüngliche Nachricht-
Von: LAFK [mailto:tomasz.bo...@gmail.com] 
Gesendet: Mittwoch, 22. April 2015 14:01
An: solr-user@lucene.apache.org; solr-user@lucene.apache.org
Betreff: Odp.: solr issue with pdf forms

Out of my head I'd follow how are writable PDFs created and encoded.

@LAFK_PL
  Oryginalna wiadomość  
Od: steve.sch...@t-systems.com
Wysłano: środa, 22 kwietnia 2015 12:41
Do: solr-user@lucene.apache.org
Odpowiedz: solr-user@lucene.apache.org
Temat: solr issue with pdf forms

Hi guys,

hopefully you can help me with my issue. We are using a solr setup and have the 
following issue:
- usual pdf files are indexed just fine
- pdf files with writable form-fields look like this:
Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind

Somehow the blank space character is not indexed correctly.

Is this a know issue? Does anybody have an idea?

Thanks a lot
Best
Steve


Re: Bad contentType for search handler :text/xml; charset=UTF-8

2015-04-22 Thread Yonik Seeley
On Wed, Apr 22, 2015 at 11:00 AM, didier deshommes dfdes...@gmail.com wrote:
 curl 
 http://localhost:8983/solr/gettingstarted/select?wt=jsonindent=trueq=foundation;
 -H Content-type:application/json

You're telling Solr the body encoding is JSON, but then you don't send any body.
We could catch that error earlier perhaps, but it still looks like an error?

-Yonik


phraseFreq vs sloppyFreq

2015-04-22 Thread Dmitry Kan
Hi guys. I'm executing the following proximity query: leader the~1000. In
the debugQuery I see phraseFreq=0.032258064. Is phraseFreq same thing as
sloppyFreq from
https://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html
?

Do higher  phraserFreq increase the final similarity score?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


solr issue with pdf forms

2015-04-22 Thread Steve.Scholl
Hi guys,

hopefully you can help me with my issue. We are using a solr setup and have the 
following issue:
- usual pdf files are indexed just fine
- pdf files with writable form-fields look like this:
Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind

Somehow the blank space character is not indexed correctly.

Is this a know issue? Does anybody have an idea?

Thanks a lot
Best
Steve


Re: Bad contentType for search handler :text/xml; charset=UTF-8

2015-04-22 Thread bengates
Looks like Solarium hardcodes a default header Content-Type: text/xml;
charset=utf-8 if none provided.
Removing it solves the problem.

It seems that Solr 5.1 doesn't support this content-type.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201579.html
Sent from the Solr - User mailing list archive at Nabble.com.


MLT causing Problems

2015-04-22 Thread Srinivas Rishindra
Hello,

I am working on a project in which i have to find similar documents.
While I implementing the following error is occurring. Please let me know
what to do.

Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8980/solr/rishi: Expected mime type
application/octet-stream but got text/html. html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 404 Not Found/title
/head
bodyh2HTTP ERROR 404/h2
pProblem accessing /solr/rishi/mlt. Reason:
preNot Found/pre/phr /ismallPowered by
Jetty:///small/ibr/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/

/body
/html

at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:525)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)
at MoreLikeThis.main(MoreLikeThis.java:31)


Re: Suggester

2015-04-22 Thread Martin Keller
Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t change 
anything.
The suggest dictionary is freshly built.
As I mentioned before, only words or phrases of the source field „content“ are 
not matched.
When querying the index, the response only contains „suggestions“ field data 
not coming from the „content“ field.
The complete schema is a slightly modified techproducts schema.
„Normal“ searching for words which I would expect coming from „content“ works.

Any more ideas?

Thanks 
Martin


 Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com:
 
 Did you build your suggest dictionary after indexing? Kind of a shot in the
 dark but worth a try.
 
 Note that the suggest field of your suggester isn't using your text_suggest
 field type to make suggestions, it's using text_general. IOW, the text may
 not be analyzed as you expect.
 
 Best,
 Erick
 
 On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller
 martin.kel...@unitedplanet.com wrote:
 Hello together,
 
 I have some problems with the Solr 5.1.0 suggester.
 I followed the instructions in 
 https://cwiki.apache.org/confluence/display/solr/Suggester and also tried 
 the techproducts example delivered with the binary package, which is working 
 well.
 
 I added a field suggestions-Field to the schema:
 
 field name=suggestions type=text_suggest indexed=true stored=true 
 multiValued=true“/
 
 
 And added some copies to the field:
 
 copyField source=content dest=suggestions/
 copyField source=title dest=suggestions/
 copyField source=author dest=suggestions/
 copyField source=description dest=suggestions/
 copyField source=keywords dest=suggestions/
 
 
 The field type definition for „text_suggest“ is pretty simple:
 
 fieldType name=text_suggest class=solr.TextField 
 positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt /
filter class=solr.LowerCaseFilterFactory/
/analyzer
 /fieldType
 
 
 I Also changed the solrconfig.xml to use the suggestions field:
 
 searchComponent class=solr.SuggestComponent name=suggest
  lst name=suggester
str name=namemySuggester/str
str name=lookupImplFuzzyLookupFactory/str
str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldsuggestions/str
str name=suggestAnalyzerFieldTypetext_general/str
str name=buildOnStartupfalse/str
  /lst
 /searchComponent
 
 
 For Tokens original coming from „title or „author“, I get suggestions, but 
 not any from the content field.
 So, what do I have to do?
 
 Any help is appreciated.
 
 
 Martin
 



Re: Bad contentType for search handler :text/xml; charset=UTF-8

2015-04-22 Thread bengates
Hello,

I've got the same issue after an upgrade from Solr 5.0 to 5.1, even on GET
requests.
Actually i'm using PHP Solarium library to perform  my requests. This is the
error the library gets now, on a search handler. The request is transported
with cUrl.

What's weird is when I copy/paste the Url Solarium generates in my browser,
I don't get the error.
Maybe Solr 5.1 requires a new header which is automatically sent by the
browser but not by cUrl.

I'll investigate on this...
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201564.html
Sent from the Solr - User mailing list archive at Nabble.com.


Exception while using group with timeAllowed on SolrCloud

2015-04-22 Thread forest_soup
We have the same issue as this JIRA. 
https://issues.apache.org/jira/browse/SOLR-6156

I have posted my query, response and solr logs to the JIAR. 

Could anyone please take a look? Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-while-using-group-with-timeAllowed-on-SolrCloud-tp4201570.html
Sent from the Solr - User mailing list archive at Nabble.com.


Odp.: phraseFreq vs sloppyFreq

2015-04-22 Thread LAFK
Out of curiosity, why proximity 1k?

@LAFK_PL
  Oryginalna wiadomość  
Od: Dmitry Kan
Wysłano: środa, 22 kwietnia 2015 09:26
Do: solr-user@lucene.apache.org
Odpowiedz: solr-user@lucene.apache.org
Temat: phraseFreq vs sloppyFreq

Hi guys. I'm executing the following proximity query: leader the~1000. In
the debugQuery I see phraseFreq=0.032258064. Is phraseFreq same thing as
sloppyFreq from
https://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html
?

Do higher phraserFreq increase the final similarity score?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Odp.: solr issue with pdf forms

2015-04-22 Thread LAFK
Out of my head I'd follow how are writable PDFs created and encoded.

@LAFK_PL
  Oryginalna wiadomość  
Od: steve.sch...@t-systems.com
Wysłano: środa, 22 kwietnia 2015 12:41
Do: solr-user@lucene.apache.org
Odpowiedz: solr-user@lucene.apache.org
Temat: solr issue with pdf forms

Hi guys,

hopefully you can help me with my issue. We are using a solr setup and have the 
following issue:
- usual pdf files are indexed just fine
- pdf files with writable form-fields look like this:
Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind

Somehow the blank space character is not indexed correctly.

Is this a know issue? Does anybody have an idea?

Thanks a lot
Best
Steve