Solr starts without error but not working

2017-06-17 Thread Nawab Zada Asad Iqbal
Hi

So I am deploying solr 6.5.1 using puppet to another machine (which I can
ssh to) . The logs  have no error but solr home page has nothing (no
response from server) . Using curl also showed empty response.

What could be wrong ?
The server is writing   logs and also found  the core folder; so from
directory structure and logs , everything seems fine .


Regards
Nawab


Re: Issue with highlighter

2017-06-17 Thread Ali Husain
Damien, I tried that too before I sent the email. Nothing :/


http://localhost:8983/solr/testHighlight/select?hl.q=something=*=on=on=something=json


This is a bug, right?


From: Damien Kamerman 
Sent: Friday, June 16, 2017 12:11:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with highlighter

Ali, does adding a 'hl.q' param help?  q=something=something&...

On 16 June 2017 at 06:21, Ali Husain  wrote:

> Thanks for the replies. Let me try and explain this a little better.
>
>
> I haven't modified anything in solrconfig. All I did was get a fresh
> instance of solr 6.4.1 and create a core testHighlight. I then created a
> content field of type text_en via the Solr Admin UI. id was already there,
> and that is of type string.
>
>
> I then use the UI, once again to check the hl checkbox, hl.fl is set to *
> because I want any and every match.
>
>
> I push the following content into this new solr instance:
>
> id:91101
>
> content:'I am adding something to the core field and we will try and find
> it. We want to make sure the highlighter works!
>
> This is short so fragsize and max characters shouldn\'t be an issue.'
>
> As you can see, very few characters, fragsize, maxAnalyzedChars, all that
> should not be an issue.
>
>
> I then send this query:
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
> hl=on=on=something=json
>
>
> My results:
>
>
> "response":{"numFound":1,"start":0,"docs":[
>
> {"id":"91101",
>
> "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
> "_version_":1570302668841156608}]
>
>
> },
>
>
> "highlighting":{
> "91101":{}}
>
>
> I change q to be core instead of something.
>
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
> hl=on=on=core=json
>
>
> {
> "id":"91101",
> "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
> "_version_":1570302668841156608},
>
>
>
> "highlighting":{
> "91101":{
>   "content":["I am adding something to the core field and we
> will try and find it. We want to make sure"]}}
>
> I've tried a bunch of queries. 'adding', 'something' both don't return any
> highlights. 'core' 'am' 'field' all work.
>
> Am I doing a better job of explaining this? Quite puzzling why this would
> be happening. My guess is there is some file/config somewhere that is
> ignoring some words? It isn't stopwords.txt in my case though. If that
> isn't the case then it definitely seems like a bug to me.
>
> Thanks, Ali
>
>
> 
> From: David Smiley 
> Sent: Thursday, June 15, 2017 12:33:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Issue with highlighter
>
> > Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
> Seems like a bug to me; the default operator shouldn't matter in that case
> I think since there is only one clause that has no BooleanQuery.Occur
> operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
> effectively required and should definitely be highlighted.
>
> Note to Ali: Phil's comment implies use of hl.method=unified which is not
> the default.
>
> On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden 
> wrote:
>
> > Just had similar issue - works for some, not others. First thing to look
> > at is hl.maxAnalyzedChars is the query. The default is quite small.
> > Since many of my documents are large PDF files, I opted to use
> > storeOffsetsWithPositions="true" termVectors="true" on the field I was
> > searching on.
> > This certainly did increase my index size but not too bad and certainly
> > fast.
> > https://cwiki.apache.org/confluence/display/solr/Highlighting
> >
> > Beware of NOT plus OR in a search. That will certainly produce no
> > highlights. (eg test -results when default op is OR)
> >
> >
> > -Original Message-
> > From: Ali Husain [mailto:alihus...@outlook.com]
> > Sent: Thursday, 15 June 2017 11:11 a.m.
> > To: solr-user@lucene.apache.org
> > Subject: Issue with highlighter
> >
> > Hi,
> >
> >
> > I think I've found a bug with the highlighter. I search for the word
> > "something" and I get an empty highlighting response for all the
> documents
> > that are returned shown below. The fields that I am searching over are
> > text_en, the highlighter works for a lot of queries. I have no
> > stopwords.txt list that could be messing this up either.
> >
> >
> >  "highlighting":{
> > "310":{},
> > "103":{},
> > "406":{},
> > "1189":{},
> > "54":{},
> > "292":{},
> > "309":{}}}
> >
> >
> > Just changing the search term 

Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

2017-06-17 Thread ZiYuan
Hi,

I am new to Solr and I need to implement a full-text search of some PDF
files. The indexing part works out of the box by using bin/post. I can see
search results in the admin UI given some queries, though without the
matched texts and the context.

Now I am reading this post

for the highlighting part. It is for an older version of Solr when managed
schema was not available. Before fully understand what it is doing I have
some questions:

1. He defined two fields:




But why are there two fields needed? Can I define a field



to capture the full text?

2. How are the fields filled? I don't see relevant information in
TikaEntityProcessor's documentation
.
The current text extractor should already be Tika (I can see

"x_parsed_by":
["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.pdf.PDFParser"]

in the returned JSON of some query). But even I define the fields as he
said I cannot see them in the search results as keys in JSON.

3. The _text_ field seems a concatenation of other fields, does it contain
the full text? Though it does not seem to be accessible by default.

To be brief, using The Elements of Statistical Learning

as an example, how to highlight the relevant texts for the query "SVM"? And
if changing the file name into "The Elements of Statistical Learning -
Trevor Hastie.pdf" and post it, how to highlight "Trevor Hastie" for the
query "id:Trevor Hastie"?

Thank you.

Best regards,
Ziyuan


Re: org.apache.lucene.index.CheckIndex throws Illegal initial capacity: -16777216

2017-06-17 Thread Moritz Michael






Thx for the advice Alan. I'm already aware of this fact (that's 
why I used the checkIndex tool of Solr 5). 




_
From: Alan Woodward 
Sent: Samstag, Juni 17, 2017 4:50 PM
Subject: Re: org.apache.lucene.index.CheckIndex throws Illegal initial 
capacity: -16777216
To:  


Solr/Lucene 6 can’t read 4.6 index files, only 5.x ones.  So you’ll need to 
upgrade from 4.6 to 5.x using the upgrade tool from the latest 5.x release, 
then from 5.x to 6 using the current upgrade tool.

Alan Woodward
www.flax.co.uk


> On 17 Jun 2017, at 10:08, Moritz Michael  wrote:
> 
> Hello,
> 
> I'm trying to upgrade a Solr 4.6 index to Solr 6.
> The upgrade does fail with an error.
> 
> I tried to check the index with org.apache.lucene.index.CheckIndex using
> this command:
> java -cp lucene-core-5.5.4.jar:lucene-backward-codecs-5.5.4.jar
> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
> ./[PATH-TO-INDEX]/data/index
> 
> The check fail with this error:
> 
> Opening index @ ./[PATH-TO-INDEX]/data/index
>> 
>> ERROR: could not read any segments file in directory
>> java.lang.IllegalArgumentException: Illegal initial capacity: -16777216
>>at java.util.HashMap.(HashMap.java:448)
>>at java.util.HashMap.(HashMap.java:467)
>>at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:393)
>>at
>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:488)
>>at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2407)
>>at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2309)
>>at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2235)
>> 
> 
> I tried this with Cygwin and the Windows 10 Ubuntu subsystem with the same
> result.
> 
> Any ideas?
> 
> Best
> Moritz







Re: org.apache.lucene.index.CheckIndex throws Illegal initial capacity: -16777216

2017-06-17 Thread Alan Woodward
Solr/Lucene 6 can’t read 4.6 index files, only 5.x ones.  So you’ll need to 
upgrade from 4.6 to 5.x using the upgrade tool from the latest 5.x release, 
then from 5.x to 6 using the current upgrade tool.

Alan Woodward
www.flax.co.uk


> On 17 Jun 2017, at 10:08, Moritz Michael  wrote:
> 
> Hello,
> 
> I'm trying to upgrade a Solr 4.6 index to Solr 6.
> The upgrade does fail with an error.
> 
> I tried to check the index with org.apache.lucene.index.CheckIndex using
> this command:
> java -cp lucene-core-5.5.4.jar:lucene-backward-codecs-5.5.4.jar
> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
> ./[PATH-TO-INDEX]/data/index
> 
> The check fail with this error:
> 
> Opening index @ ./[PATH-TO-INDEX]/data/index
>> 
>> ERROR: could not read any segments file in directory
>> java.lang.IllegalArgumentException: Illegal initial capacity: -16777216
>>at java.util.HashMap.(HashMap.java:448)
>>at java.util.HashMap.(HashMap.java:467)
>>at
>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:393)
>>at
>> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:488)
>>at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2407)
>>at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2309)
>>at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2235)
>> 
> 
> I tried this with Cygwin and the Windows 10 Ubuntu subsystem with the same
> result.
> 
> Any ideas?
> 
> Best
> Moritz



NumberFormatException while Solr CSV import

2017-06-17 Thread Kshitij Shukla

Hello Everyone!

Hope you all are having good time.

I am getting an error when importing a CSV file into Solr core. In my 
case I am running an schemaless solr instance as I wont know which type 
of CSV I will get. The problem occured when Solr consider a column as an 
integer column but the data contains some doubles.


*Stack Trace: Please find attached.*

I guess, its the default feature of solr as mentioned on the very bottom 
of this webpage cwiki.apache.org/confluence/display/solr/Schemaless+Mode 
But is there a way I can bypass or override this, for example taking all 
the tLong type columns as tDouble?


Any Advise?

--
Kshitij S
Team Leader *
Cyber Infrastructure (P) Limited, [CIS] * *(CMMI Level 3 Certified)
* www.cisin.com | +CISIN  | Linkedin 
 |

Offices: *Indore, India. Silicon Valley, USA. Singapore. South Africa*


*DISCLAIMER:* INFORMATION PRIVACY is important for us, If you are not 
the intended recipient, you should delete this message and are notified 
that any disclosure, copying or distribution of this message, or taking 
any action based on it, is strictly prohibited by Law.


--

--

*Cyber Infrastructure, [CIS] *(CMMI Level 3 Company)

Central India's largest Technology Company.

*Ensuring your success through our highly optimized Technology solutions.*

www.cisin.com | +Cisin  | Linkedin 
 | Offices:  
India | USA | Singapore | South Africa.

--

*** Please note that this message and any attachments may contain 
confidential and proprietary material and information and are intended only 
for the use of the intended recipient(s). If you are not the one, you 
should delete it immediately to avoid any copyright issues. 
org.apache.solr.common.SolrException: ERROR: 
[doc=ce2e3571-af3f-43cb-8c79-0249996e7dd7] Error adding field 
'Data_value'='14.3' msg=For input string: "14.3"
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:240)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:101)
at 
org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:922)
at 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:913)
at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:980)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1193)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:749)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:336)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)

Re: Solr Capacity Planning

2017-06-17 Thread Will Martin
MODERATOR REQUESTED: 

> On Jun 17, 2017, at 3:56 AM, Greenhorn Techie  
> wrote:
> 
> Hi,
> 
> We are planning to setup a Solr cloud for building a search application on
> huge volumes of data points (~hundreds of billions of solr documents) I
> would like to understand if there is any recommendation on how to size the
> infrastructure and hardware requirements for Solr clusters. Also, what are
> the best practices to consider during this setup.
> 
> Thanks

Seriously.
Will Martin



org.apache.lucene.index.CheckIndex throws Illegal initial capacity: -16777216

2017-06-17 Thread Moritz Michael
Hello,

I'm trying to upgrade a Solr 4.6 index to Solr 6.
The upgrade does fail with an error.

I tried to check the index with org.apache.lucene.index.CheckIndex using
this command:
java -cp lucene-core-5.5.4.jar:lucene-backward-codecs-5.5.4.jar
-ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
./[PATH-TO-INDEX]/data/index

The check fail with this error:

Opening index @ ./[PATH-TO-INDEX]/data/index
>
> ERROR: could not read any segments file in directory
> java.lang.IllegalArgumentException: Illegal initial capacity: -16777216
> at java.util.HashMap.(HashMap.java:448)
> at java.util.HashMap.(HashMap.java:467)
> at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:393)
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:488)
> at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2407)
> at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2309)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2235)
>

I tried this with Cygwin and the Windows 10 Ubuntu subsystem with the same
result.

Any ideas?

Best
Moritz


Solr Capacity Planning

2017-06-17 Thread Greenhorn Techie
Hi,

We are planning to setup a Solr cloud for building a search application on
huge volumes of data points (~hundreds of billions of solr documents) I
would like to understand if there is any recommendation on how to size the
infrastructure and hardware requirements for Solr clusters. Also, what are
the best practices to consider during this setup.

Thanks