Jetty JNDI connection pooling

2021-02-26 Thread Srinivas Kashyap
Hi,

Our datasource is oracle db and we are pulling data to solr through JDBC(DIH). 
I have below entry in jetty.xml



jdbc/tss

  
thin
:1521:ORCL
XXX
XXX
  

  

And we have added below entry in server/solr-webapp/webapp/WEB-INF/web.xml


jdbc/tss
javax.sql.DataSource
Container
  


What is the default connection pool limit for this datasource? Also, how to set 
the max connections that can be made from jetty?

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: QueryResponse ordering

2021-01-14 Thread Srinivas Kashyap
Hi Alessandro,

I'm trying to retrieve party id 'abc' 'def' 'ghi' in the same order I pass to 
filter query. Is this possible?

The sorting field which I want to get results is not in solr schema for party 
core. The sorting field Is outside solr. I want to able to fetch the 
QueryResponse(SolrJ) solrdocumentList sorted based on this sorting criteria.

Yes, I understand bosst parameter bq doesn't apply on filter queries. Is there 
an alternative?

Thanks,
Srinivas
From: Alessandro Benedetti 
Sent: 14 January 2021 01:55
To: solr-user@lucene.apache.org
Subject: Re: QueryResponse ordering

Hi Srinivas,
Filter queries don't impact scoring but only matching.
So, what is the ordering you are expecting?
A bq (boost query) parameter will add a clause to the query, impacting the
score in an additive way.
The query you posted is a bit confused, what was your intent there?
To boost search results having "abc" as the PARTY.PARTY.ID ?
https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebq_BoostQuery_Parameter



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: 
https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


QueryResponse ordering

2021-01-13 Thread Srinivas Kashyap
Hello,

I have a scenario where I'm using filter query to fetch the results.

Example: Filter query(fq) - PARTY_ID:(abc OR def OR ghi)

Now I'm getting query response through solrJ in different order. Is there a way 
I can get the results in same order as specified in filter query?

Tried dismax boost bq parameter, but it is not returning any value. Please 
refere below url

http://localhost:8983/solr/party/select?bq=PARTY.PARTY_ID:"abc"^2+PARTY.PARTY_ID:"def"^1=dismax=PARTY.PARTY_ID:"abc"
 OR PARTY.PARTY_ID:"def"=*:*

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Avoiding duplicate entry for a multivalued field

2020-10-30 Thread Srinivas Kashyap
Thanks Munendra, this will really help me. Are there any performance overhead 
with this?

Thanks,
Srinivas


From: Munendra S N 
Sent: 30 October 2020 19:20
To: solr-user@lucene.apache.org
Subject: Re: Avoiding duplicate entry for a multivalued field

Srinivas,

For atomic updates, you could use add-distinct operation to avoid
duplicates -
https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html>
This operation is available from Solr 7.3

Regards,
Munendra S N



On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood 
mailto:wun...@wunderwood.org>>
wrote:

> Since you are already taking the performance hit of atomic updates,
> I doubt you’ll see any impact from field types or update request
> processors.
> The extra cost of atomic updates will be much greater than indexing cost.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org<mailto:wun...@wunderwood.org>
> http://observer.wunderwood.org/<http://observer.wunderwood.org> (my blog)
>
> > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap 
> > mailto:srini...@bamboorose.com.INVALID>>
> wrote:
> >
> > Thanks Dwane,
> >
> > I have a doubt, according to the java doc, the duplicates still continue
> to exist in the field. May be during query time, the field returns only
> unique values? Am I right with my assumption?
> >
> > And also, what is the performance overhead for this UniqueFiled*Factory?
> >
> > Thanks,
> > Srinivas
> >
> > From: Dwane Hall mailto:dwaneh...@hotmail.com>>
> > Sent: 29 October 2020 14:33
> > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> > Subject: Re: Avoiding duplicate entry for a multivalued field
> >
> > Srinivas this is possible by adding an unique field update processor to
> the update processor chain you are using to perform your updates (/update,
> /update/json, /update/json/docs, .../a_custom_one)
> >
> > The Java Documents explain its use nicely
> > (
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>>)
> or there are articles on stack overflow addressing this exact problem (
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>
> >)
> >
> > Thanks,
> >
> > Dwane
> > 
> > From: Srinivas Kashyap mailto:srini...@bamboorose.com.INVALID%3cmailto:%0b>> 
srini...@bamboorose.com.INVALID<mailto:srini...@bamboorose.com.INVALID>>>
> > Sent: Thursday, 29 October 2020 3:49 PM
> > To: 
> > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>
> >  <
> solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>>
> > Subject: Avoiding duplicate entry for a multivalued field
> >
> > Hello,
> >
> > Say, I have a schema field which is multivalued. Is there a way to
> maintain distinct values for that field though I continue to add duplicate
> values through atomic update via solrj?
> >
> > Is there some property setting to have only unique values in a multi
> valued fields?
> >
> > Thanks,
> > Srinivas
> > 
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
> >
> > Disclaimer
> >
> > The information contained in

RE: Avoiding duplicate entry for a multivalued field

2020-10-29 Thread Srinivas Kashyap
Thanks Dwane,

I have a doubt, according to the java doc, the duplicates still continue to 
exist in the field. May be during query time, the field returns only unique 
values? Am I right with my assumption?

And also, what is the performance overhead for this UniqueFiled*Factory?

Thanks,
Srinivas

From: Dwane Hall 
Sent: 29 October 2020 14:33
To: solr-user@lucene.apache.org
Subject: Re: Avoiding duplicate entry for a multivalued field

Srinivas this is possible by adding an unique field update processor to the 
update processor chain you are using to perform your updates (/update, 
/update/json, /update/json/docs, .../a_custom_one)

The Java Documents explain its use nicely
(https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
 or there are articles on stack overflow addressing this exact problem 
(https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>)

Thanks,

Dwane
____
From: Srinivas Kashyap 
mailto:srini...@bamboorose.com.INVALID>>
Sent: Thursday, 29 October 2020 3:49 PM
To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org> 
mailto:solr-user@lucene.apache.org>>
Subject: Avoiding duplicate entry for a multivalued field

Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain 
distinct values for that field though I continue to add duplicate values 
through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued 
fields?

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Avoiding duplicate entry for a multivalued field

2020-10-28 Thread Srinivas Kashyap
Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain 
distinct values for that field though I continue to add duplicate values 
through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued 
fields?

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Sql entity processor sortedmapbackedcache out of memory issue

2020-10-02 Thread Srinivas Kashyap
Hi Shawn,

Continuing with the older thread, I have implemented WHERE clause on the inner 
child entity. When the import is run, whether it brings only the records 
matched with WHERE condition to JVM memory or will it bring entire SQL with 
joined tables on to JVM and does the WHERE filter in memory?

Also, I have written custom java code, 'onImportEnd' event listener. Can I call 
destroy() method of SortedMapBackedCache class to remove the cached entities in 
this event listener. This is required since for every import, there would be 
some entities which would be new and wouldn't be present in previous run of dih 
cache. My assumption is, when I call destroy method it would free up the JVM 
memory and wouldn't cause OOM.


Also Is there a way I can specify Garbage collection to run on DIHCache every 
time when an import is finished on a core.

P.S: Ours is a standalone Solr server with 18 cores in it. Each core is in sync 
by running full-import on SortedMapBackedCache entities with WHERE clause based 
on timestamp(last index time) on child entities.

-Original Message-
From: Shawn Heisey 
Sent: 09 April 2019 13:27
To: solr-user@lucene.apache.org
Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue

On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below 
> for solr core:
>
> 
> 16 child entities
> 
>
> During indexing, since the number of requests being made to database was 
> high(to process one document 17 queries) and was utilizing most of 
> connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to 
> reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to store 
the results of the entire query in memory, even if you don't need all of the 
results.  If the database has a lot of rows, that's going to take a lot of 
memory.

In your excerpt from the config, your inner entity doesn't have a WHERE clause. 
 Which means that it's going to retrieve all of the rows of the ABC table for 
*EVERY* single entry in the DEF table.  That's going to be exceptionally slow.  
Normally the SQL query on inner entities will have some kind of WHERE clause 
that limits the results to rows that match the entry from the outer entity.

You may need to write a custom indexing program that runs separately from Solr, 
possibly on an entirely different server.  That might be a lot more efficient 
than DIH.

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: PDF extraction using Tika

2020-08-25 Thread Srinivas Kashyap
Thanks Phil,

I will modify it according to the need.

Thanks,
Srinivas

-Original Message-
From: Phil Scadden  
Sent: 26 August 2020 02:44
To: solr-user@lucene.apache.org
Subject: RE: PDF extraction using Tika

Code for solrj is going to be very dependent on your needs but the beating 
heart of my code is below ( note that I do OCR as separate step before feeding 
files into indexer). Solrj and tika docs should help.

File f = new File(filename);
 ContentHandler textHandler = new 
BodyContentHandler(Integer.MAX_VALUE);
 Metadata metadata = new Metadata();
 Parser parser = new AutoDetectParser();
 ParseContext context = new ParseContext();
 if (filename.toLowerCase().contains("pdf")) {
   PDFParserConfig pdfConfig = new PDFParserConfig();
   pdfConfig.setExtractInlineImages(false);
   pdfConfig.setOcrStrategy(PDFParserConfig.OCR_STRATEGY.NO_OCR);
   context.set(PDFParserConfig.class,pdfConfig);
   context.set(Parser.class,parser);
 }
 InputStream input = new FileInputStream(f);
 try {
   parser.parse(input, textHandler, metadata, context);
 } catch (Exception e) {
   
Logger.getLogger(JsMapAdminService.class.getName()).log(Level.SEVERE, 
null,String.format("File %s failed", f.getCanonicalPath()));
   e.printStackTrace();
   writeLog(String.format("File %s failed", f.getCanonicalPath()));
   return false;
  }
 SolrInputDocument up = new SolrInputDocument();
 if (title==null) title = metadata.get("title");
 if (author==null) author = metadata.get("author");
 up.addField("id",f.getCanonicalPath());
 up.addField("location",idString);
 up.addField("title",title);
 up.addField("author",author); etc for all your fields.
 String content = textHandler.toString();
 up.addField("_text_",content);
 UpdateRequest req = new UpdateRequest();
 req.add(up);
 req.setBasicAuthCredentials("solrAdmin", password);
 UpdateResponse ur =  req.process(solr,"prindex");
 req.commit(solr, "prindex");

-Original Message-
From: Srinivas Kashyap 
Sent: Tuesday, 25 August 2020 17:04
To: solr-user@lucene.apache.org
Subject: RE: PDF extraction using Tika

Hi Alexandre,

Yes, these are the same PDF files running in windows and linux. There are 
around 30 pdf files and I tried indexing single file, but faced same error. Is 
it related to how PDF stored in linux?

And with regard to DIH and TIKA going away, can you share if any program which 
extracts from PDF and pushes into solr?

Thanks,
Srinivas Kashyap

-Original Message-
From: Alexandre Rafalovitch 
Sent: 24 August 2020 20:54
To: solr-user 
Subject: Re: PDF extraction using Tika

The issue seems to be more with a specific file and at the level way below 
Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' actual='
' at offset 2383
at
org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045)

Are you indexing the same files on Windows and Linux? I am guessing not. I 
would try to narrow down which of the files it is. One way could be to get a 
standalone Tika (make sure to match the version Solr
embeds) and run it over the documents by itself. It will probably complain with 
the same error.

Regards,
   Alex.
P.s. Additionally, both DIH and Embedded Tika are not recommended for 
production. And both will be going away in future Solr versions. You may have a 
much less brittle pipeline if you save the structured outputs from those Tika 
standalone runs and then index them into Solr, possibly pre-processed.

On Mon, 24 Aug 2020 at 11:09, Srinivas Kashyap 
 wrote:
>
> Hello,
>
> We are using TikaEntityProcessor to extract the content out of PDF and make 
> the content searchable.
>
> When jetty is run on windows based machine, we are able to successfully load 
> documents using full import DIH(tika entity). Here PDF's is maintained in 
> windows file system.
>
> But when jetty solr is run on linux machine, and try to run DIH, we 
> are getting below exception: (Here PDF's are maintained in linux
> filesystem)
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read 
> content Processing Document # 1
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:

RE: PDF extraction using Tika

2020-08-24 Thread Srinivas Kashyap
Hi Alexandre,

Yes, these are the same PDF files running in windows and linux. There are 
around 30 pdf files and I tried indexing single file, but faced same error. Is 
it related to how PDF stored in linux?

And with regard to DIH and TIKA going away, can you share if any program which 
extracts from PDF and pushes into solr?

Thanks,
Srinivas Kashyap

-Original Message-
From: Alexandre Rafalovitch  
Sent: 24 August 2020 20:54
To: solr-user 
Subject: Re: PDF extraction using Tika

The issue seems to be more with a specific file and at the level way below 
Solr's or possibly even Tika's:
Caused by: java.io.IOException: expected='>' actual='
' at offset 2383
at
org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045)

Are you indexing the same files on Windows and Linux? I am guessing not. I 
would try to narrow down which of the files it is. One way could be to get a 
standalone Tika (make sure to match the version Solr
embeds) and run it over the documents by itself. It will probably complain with 
the same error.

Regards,
   Alex.
P.s. Additionally, both DIH and Embedded Tika are not recommended for 
production. And both will be going away in future Solr versions. You may have a 
much less brittle pipeline if you save the structured outputs from those Tika 
standalone runs and then index them into Solr, possibly pre-processed.

On Mon, 24 Aug 2020 at 11:09, Srinivas Kashyap 
 wrote:
>
> Hello,
>
> We are using TikaEntityProcessor to extract the content out of PDF and make 
> the content searchable.
>
> When jetty is run on windows based machine, we are able to successfully load 
> documents using full import DIH(tika entity). Here PDF's is maintained in 
> windows file system.
>
> But when jetty solr is run on linux machine, and try to run DIH, we 
> are getting below exception: (Here PDF's are maintained in linux 
> filesystem)
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read 
> content Processing Document # 1
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
> at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
> at 
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read 
> content Processing Document # 1
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
> ... 4 more
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Unable to read content Processing Document # 1
> at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
> at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
> ... 6 more
> Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF 
> content
> at 
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
> at 
> org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
> ... 10 more
> Caused by: java.io.IOException: expected='>' actual='
> ' at offset 2383
> at 
> org.apache.pdfbox.pdfparser.BaseParser.readExpe

PDF extraction using Tika

2020-08-24 Thread Srinivas Kashyap
Hello,

We are using TikaEntityProcessor to extract the content out of PDF and make the 
content searchable.

When jetty is run on windows based machine, we are able to successfully load 
documents using full import DIH(tika entity). Here PDF's is maintained in 
windows file system.

But when jetty solr is run on linux machine, and try to run DIH, we are getting 
below exception: (Here PDF's are maintained in linux filesystem)

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read 
content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read 
content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to read content Processing Document # 1
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
... 6 more
Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF 
content
at 
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139)
at 
org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at 
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
... 10 more
Caused by: java.io.IOException: expected='>' actual='
' at offset 2383
at 
org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045)
at 
org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:226)
at 
org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:163)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:510)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
at 
org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
at 
org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
at 
org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at 
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
... 15 more

Can you please suggest, how to extract PDF from linux based file system?

Thanks,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended sol

HttpSolrClient Connection Evictor

2020-08-09 Thread Srinivas Kashyap
Hello,

We are using HttpSolrClient(solr-solrj-8.4.1.jar) in our app along with 
required jar(httpClient-4.5.6.jar). Before that we upgraded these jars from 
(solr-solrj-5.2.1.jar) and (httpClient-4.4.1.jar).

After we upgraded, we are seeing lot of below connection evictor statements in 
log file.

DEBUG USER_ID - STEP 2020-08-09 13:59:33,085 [Connection evictor] - Closing 
expired connections
DEBUG USER_ID - STEP 2020-08-09 13:59:33,085 [Connection evictor] - Closing 
connections idle longer than 5 MILLISECONDS
DEBUG USER_ID - STEP 2020-08-09 13:59:33,154 [Connection evictor] - Closing 
expired connections
DEBUG USER_ID - STEP 2020-08-09 13:59:33,154 [Connection evictor] - Closing 
connections idle longer than 5 MILLISECONDS
DEBUG USER_ID - STEP 2020-08-09 13:59:33,168 [Connection evictor] - Closing 
expired connections
DEBUG USER_ID - STEP 2020-08-09 13:59:33,168 [Connection evictor] - Closing 
connections idle longer than 5 MILLISECONDS
DEBUG USER_ID - STEP 2020-08-09 13:59:33,172 [Connection evictor] - Closing 
expired connections
DEBUG USER_ID - STEP 2020-08-09 13:59:33,172 [Connection evictor] - Closing 
connections idle longer than 5 MILLISECONDS
DEBUG USER_ID - STEP 2020-08-09 13:59:33,214 [Connection evictor] - Closing 
expired connections
DEBUG USER_ID - STEP 2020-08-09 13:59:33,214 [Connection evictor] - Closing 
connections idle longer than 5 MILLISECONDS
DEBUG USER_ID - STEP 2020-08-09 13:59:34,061 [Connection evictor] - Closing 
expired connections
DEBUG USER_ID - STEP 2020-08-09 13:59:34,061 [Connection evictor] - Closing 
connections idle longer than 5 MILLISECONDS

These statements appear when we try to access a module which is bound with solr 
as shown below:

HttpSolrClient client = null;
try
{
  client =  new HttpSolrClient.Builder(solrURL).build();

  QueryResponse response = client.query(query);

}
finally
{
  if(client!=null)
  {
try {
  client.close();
} catch (IOException e) {
  logger.debug("Error in closing 
HttpSolrClient"+e.getMessage());
}
  }
}

Is there a way we can turn off the logging or set something which doesn't cause 
log statements to appear?

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Question on sorting

2020-07-22 Thread Srinivas Kashyap
Hello,

I have schema and field definition as shown below:






TRACK_ID field contains "NUMERIC VALUE".

When I use sort on track_id (TRACK_ID desc) it is not working properly.

->I have below values in Track_ID

Doc1: "84806"
Doc2: "124561"

Ideally, when I use sort command, query result should be

Doc2: "124561"
Doc1: "84806"

But I'm getting:

Doc1: "84806"
Doc2: "124561"

Is this because, field type is string and doc1 has 5 digits and doc2 has 6 
digits?

Please provide solution for this.

Thanks,
Srinivas



DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Nested grouping

2020-06-26 Thread Srinivas Kashyap
Hi All,

I have below requirement for my business:

select?fl=*=MODIFY_TS:[2020-06-23T18:30:00Z TO *]=PHY_KEY2: "HQ010699" OR 
PHY_KEY2: "HQ010377" OR PHY_KEY2: "HQ010396" OR PHY_KEY2: "HQ010399" OR 
PHY_KEY2: "HQ010404" OR PHY_KEY2: "HQ010419" OR PHY_KEY2: "HQ010426" OR 
PHY_KEY2: "HQ010452" OR PHY_KEY2: "HQ010463" OR PHY_KEY2: "HQ010466" OR 
PHY_KEY2: "HQ010469" OR PHY_KEY2: "HQ010476" OR PHY_KEY2: "HQ010480" OR 
PHY_KEY2: "HQ010481" OR PHY_KEY2: "HQ010496" OR PHY_KEY2: "HQ010500" OR 
PHY_KEY2: "HQ010501" OR PHY_KEY2: "HQ010502" OR PHY_KEY2: "HQ010503" OR 
PHY_KEY2: "HQ010504"

Above query lists all the changes mentioned for 20 documents.

If I add below to query:

group=true=PHY_KEY2=true

Below is the response:

"grouped":{
"PHY_KEY2":{
  "matches":23,
  "ngroups":4,
  "groups":[{
  "groupValue":"HQ010500",
  "doclist":{"numFound":3,"start":0,"docs":[
  {
"PHY_KEY2":"HQ010500"}]
  }},
{
  "groupValue":"HQ010399",
  "doclist":{"numFound":4,"start":0,"docs":[
  {
"PHY_KEY2":"HQ010399"}]
  }},
{
  "groupValue":"HQ010377",
  "doclist":{"numFound":8,"start":0,"docs":[
  {
    "PHY_KEY2":"HQ010377"}]
  }},
{
  "groupValue":"HQ010699",
  "doclist":{"numFound":8,"start":0,"docs":[
  {
"PHY_KEY2":"HQ010699"}]
  }}]}}}

Take the case of last entry, HQ010699. It says numFound=8 (8 docs). But all of 
these 8 documents have the same value for field called TRACK_ID.  So can I 
group again on the TRACK_ID to get the count as 1.


Thanks and Regards,
Srinivas Kashyap


DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: Solr takes time to warm up core with huge data

2020-06-08 Thread Srinivas Kashyap
Hi Shawn,

It's a vague question and I haven't tried it out yet.

Can I instead mention query as below:

Basically instead of



q=*:*=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
*]=PHY_KEY2:"HQ012206"=PHY_KEY1:"BAMBOOROSE"=1000=MODIFY_TS 
desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc



pass



q=PHY_KEY2:" HQ012206"+AND+PHY_KEY1:" BAMBOOROSE 
"=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
*]=1000=MODIFY_TS desc,LOGICAL_SECT_NAME asc,TRACK_ID 
desc,TRACK_INTER_ID asc,PHY_KEY1 asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 
asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 
asc,FIELD_NAME asc


Instead of q=*:* I pass only those fields which I want to retrieve. Will this 
be faster?

Related to earlier question:
We are using 8.4.1 version
All the fields that I'm using on sorting are all string data type(modify ts 
date) with indexed=true stored=true


Thanks,
Srinivas


On 05-Jun-2020 9:50 pm, Shawn Heisey  wrote:
On 6/5/2020 12:17 AM, Srinivas Kashyap wrote:
> q=*:*=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
> *]=PHY_KEY2:"HQ012206"=PHY_KEY1:"JACK"=1000=MODIFY_TS 
> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
> asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
>
> This was the original query. Since there were lot of sorting fields, we 
> decided to not do on the solr side, instead fetch the query response and do 
> the sorting outside solr. This eliminated the need of more JVM memory which 
> was allocated. Every time we ran this query, solr would crash exceeding the 
> JVM memory. Now we are only running filter queries.

What Solr version, and what is the definition of each of the fields
you're sorting on? If the definition doesn't include docValues, then a
large on-heap memory structure will be created for sorting (VERY large
with 500 million docs), and I wouldn't be surprised if it's created even
if it is never used. The definition for any field you use for sorting
should definitely include docValues. In recent Solr versions, docValues
defaults to true for most field types. Some field classes, TextField in
particular, cannot have docValues.

There's something else to discuss about sort params -- each sort field
will only be used if ALL of the previous sort fields are identical for
two documents in the full numFound result set. Having more than two or
three sort fields is usually pointless. My guess (which I know could be
wrong) is that most queries with this HUGE sort parameter will never use
anything beyond TRACK_ID.

> And regarding the filter cache, it is in default setup: (we are using default 
> solrconfig.xml, and we have only added the request handler for DIH)
>
>  size="512"
> initialSize="512"
> autowarmCount="0"/>

This is way too big for your index, and a prime candidate for why your
heap requirements are so high. Like I said before, if the filterCache
on your system actually reaches this max size, it will require 30GB of
memory JUST for the filterCache on this core. Can you check the admin
UI to determine what the size is and what hit ratio it's getting? (1.0
is 100% on the hit ratio). I'd probably start with a size of 32 or 64
on this cache. With a size of 64, a little less than 4GB would be the
max heap allocated for the cache. You can experiment... but with 500
million docs, the filterCache size should be pretty small.

You're going to want to carefully digest this part of that wiki page
that I linked earlier. Hopefully email will preserve this link completely:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Reducingheaprequirements<https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Reducingheaprequirements>

Thanks,
Shawn


DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may 

RE: Solr takes time to warm up core with huge data

2020-06-05 Thread Srinivas Kashyap
Hi Jörn,

I think, you missed my explanation. We are not using sorting now:

The original query:

q=*:*=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
*]=PHY_KEY2:"HQ012206"=PHY_KEY1:"JACK"=1000=MODIFY_TS 
desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc

But now, I have removed sorting as shown below. The sorting is being done 
outside solr:

q=*:*=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
*]=PHY_KEY2:"HQ012206"=PHY_KEY1:"JACK"=1000

Also, we are writing custom code to index by discarding DIH too. When I restart 
the solr, this core with huge data takes time to even show up the query admin 
GUI console. It takes around 2 hours to show.

My question is, even for the simple query with filter query mentioned as shown 
above, it is consuming JVM memory. So, how much memory or what configuration 
should I be doing on solrconfig.xml to make it work.

Thanks,
Srinivas

From: Jörn Franke 
Sent: 05 June 2020 12:30
To: solr-user@lucene.apache.org
Subject: Re: Solr takes time to warm up core with huge data

I think DIH is the wrong solution for this. If you do an external custom load 
you will be probably much faster.

You have too much JVM memory from my point of view. Reduce it to eight or 
similar.

It seems you are just exporting data so you are better off work the exporting 
handler.
Add docvalues to the fields for this. It looks like you have no text field to 
be searched but only simple fields (string, date etc).

You should not use the normal handler to return many results at once. If you 
cannot use the Export handler then use cursors :

https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html#using-cursors<https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html#using-cursors>

Both work to sort large result sets without consuming the whole memory

> Am 05.06.2020 um 08:18 schrieb Srinivas Kashyap 
> mailto:srini...@bamboorose.com.invalid>>:
>
> Thanks Shawn,
>
> The filter queries are not complex. Below are the filter queries I’m running 
> for the corresponding schema entry:
>
> q=*:*=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
> *]=PHY_KEY2:"HQ012206"=PHY_KEY1:"JACK"=1000=MODIFY_TS 
> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
> asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
>
> This was the original query. Since there were lot of sorting fields, we 
> decided to not do on the solr side, instead fetch the query response and do 
> the sorting outside solr. This eliminated the need of more JVM memory which 
> was allocated. Every time we ran this query, solr would crash exceeding the 
> JVM memory. Now we are only running filter queries.
>
> And regarding the filter cache, it is in default setup: (we are using default 
> solrconfig.xml, and we have only added the request handler for DIH)
>
>  size="512"
> initialSize="512"
> autowarmCount="0"/>
>
> Now that you’re aware of the size and numbers, can you please let me know 
> what values/size that I need to increase? Is there an advantage of moving 
> this single core to solr cloud? If yes, can you let us know, how many 
> shards/replica do we require for this core considering we allow it to grow as 
> users transact. The updates to this core is not thru DIH delta import rather, 
> we are using SolrJ to push the changes.
>
> 
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>  omitTermFreqAndPositions="true" />
>
>
> Thanks,
> Srinivas
>
>
>
>> On 6/4/2020 9:51 PM, Srinivas Kashyap wrote:
>> We are on solr 8.4.1 and In standalone server mode. We have a core with 
>> 497,767,038 Records indexed. It took around 32Hours to load data through DIH.
>>
>> The disk occupancy is shown below:
>>
>> 82G /var/solr/data//data/index
>>
>> When I restarted solr instance and went to this core to query on solr admin 
>> GUI, it is hanging and is showing "Connection to Solr lost. Please check the 
>> So

RE: Solr takes time to warm up core with huge data

2020-06-05 Thread Srinivas Kashyap
Thanks Shawn,

The filter queries are not complex. Below are the filter queries I’m running 
for the corresponding schema entry:

q=*:*=PARENT_DOC_ID:100=MODIFY_TS:[1970-01-01T00:00:00Z TO 
*]=PHY_KEY2:"HQ012206"=PHY_KEY1:"JACK"=1000=MODIFY_TS 
desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc

This was the original query. Since there were lot of sorting fields, we decided 
to not do on the solr side, instead fetch the query response and do the sorting 
outside solr. This eliminated the need of more JVM memory which was allocated. 
Every time we ran this query, solr would crash exceeding the JVM memory. Now we 
are only running filter queries.

And regarding the filter cache, it is in default setup: (we are using default 
solrconfig.xml, and we have only added the request handler for DIH)



Now that you’re aware of the size and numbers, can you please let me know what 
values/size that I need to increase? Is there an advantage of moving this 
single core to solr cloud? If yes, can you let us know, how many shards/replica 
do we require for this core considering we allow it to grow as users transact. 
The updates to this core is not thru DIH delta import rather, we are using 
SolrJ to push the changes.
















Thanks,
Srinivas



On 6/4/2020 9:51 PM, Srinivas Kashyap wrote:
> We are on solr 8.4.1 and In standalone server mode. We have a core with 
> 497,767,038 Records indexed. It took around 32Hours to load data through DIH.
>
> The disk occupancy is shown below:
>
> 82G /var/solr/data//data/index
>
> When I restarted solr instance and went to this core to query on solr admin 
> GUI, it is hanging and is showing "Connection to Solr lost. Please check the 
> Solr instance". But when I go back to dashboard, instance is up and I'm able 
> to query other cores.
>
> Also, querying on this core is eating up JVM memory allocated(24GB)/(32GB 
> RAM). A query(*:*) with filterqueries is overshooting the memory with OOM.

You're going to want to have a lot more than 8GB available memory for
disk caching with an 82GB index. That's a performance thing... with so
little caching memory, Solr will be slow, but functional. That aspect
of your setup will NOT lead to out of memory.

If you are experiencing Java "OutOfMemoryError" exceptions, you will
need to figure out what resource is running out. It might be heap
memory, but it also might be that you're hitting the process/thread
limit of your operating system. And there are other possible causes for
that exception too. Do you have the text of the exception available?
It will be absolutely critical for you to determine what resource is
running out, or you might focus your efforts on the wrong thing.

If it's heap memory (something that I can't really assume), then Solr is
requiring more than the 24GB heap you've allocated.

Do you have faceting or grouping on those queries? Are any of your
filters really large or complex? These are the things that I would
imagine as requiring lots of heap memory.

What is the size of your filterCache? With about 500 million documents
in the core, each entry in the filterCache will consume nearly 60
megabytes of memory. If your filterCache has the default example size
of 512, and it actually gets that big, then that single cache will
require nearly 30 gigabytes of heap memory (on top of the other things
in Solr that require heap) ... and you only have 24GB. That could cause
OOME exceptions.

Does the server run things other than Solr?

Look here for some valuable info about performance and memory:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems<https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems>

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human

Solr takes time to warm up core with huge data

2020-06-04 Thread Srinivas Kashyap
Hello,

We are on solr 8.4.1 and In standalone server mode. We have a core with 
497,767,038 Records indexed. It took around 32Hours to load data through DIH.

The disk occupancy is shown below:

82G /var/solr/data//data/index

When I restarted solr instance and went to this core to query on solr admin 
GUI, it is hanging and is showing "Connection to Solr lost. Please check the 
Solr instance". But when I go back to dashboard, instance is up and I'm able to 
query other cores.

Also, querying on this core is eating up JVM memory allocated(24GB)/(32GB RAM). 
A query(*:*) with filterqueries is overshooting the memory with OOM.

Please advise on what I need to configure.

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Indexing huge data onto solr

2020-05-25 Thread Srinivas Kashyap
Hi Erick,

Thanks for the below response. The link which you provided holds good if you 
have single entity where you can join the tables and index it. But in our 
scenario, we have nested entities joining different tables as shown below:

db-data-config.xml:



 (table 1 join table 2)
 (table 3 join table 4)
 (table 5 join table 6)
 (table 7 join table 8)



Do you have any recommendations for it to run multiple sql’s and make it as 
single solr document that can be sent over solrJ for indexing?

Say parent entity has 100 documents, should I iterate over each one of parent 
tuples and execute the child entity sql’s(with where condition of parent) to 
create one solr document? Won’t it be more load on database by executing more 
sqls? Is there an optimum solution?

Thanks,
Srinivas
From: Erick Erickson 
Sent: 22 May 2020 22:52
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data onto solr

You have a lot more control over the speed and form of importing data if
you just do the initial load in SolrJ. Here’s an example, taking the Tika
parts out is easy:

https://lucidworks.com/post/indexing-with-solrj/<https://lucidworks.com/post/indexing-with-solrj>

It’s especially instructive to comment out just the call to 
CloudSolrClient.add(doclist…); If
that _still_ takes a long time, then your DB query is the root of the problem. 
Even with 100M
records, I’d be really surprised if Solr is the bottleneck, but the above test 
will tell you
where to go to try to speed things up.

Best,
Erick

> On May 22, 2020, at 12:39 PM, Srinivas Kashyap 
> mailto:srini...@bamboorose.com.INVALID>> 
> wrote:
>
> Hi All,
>
> We are runnnig solr 8.4.1. We have a database table which has more than 100 
> million of records. Till now we were using DIH to do full-import on the 
> tables. But for this table, when we do full-import via DIH it is taking more 
> than 3-4 days to complete and also it consumes fair bit of JVM memory while 
> running.
>
> Are there any speedier/alternates ways to load data onto this solr core.
>
> P.S: Only initial data import is problem, further updates/additions to this 
> core is being done through SolrJ.
>
> Thanks,
> Srinivas
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.
>
> Disclaimer
>
> The information contained in this communication from the sender is 
> confidential. It is intended solely for use by the recipient and others 
> authorized to receive it. If you are not the recipient, you are hereby 
> notified that any disclosure, copying, distribution or taking action in 
> relation of the contents of this information is strictly prohibited and may 
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been 
> automatically archived by Mimecast Ltd, an innovator in Software as a Service 
> (SaaS) for business. Providing a safer and more useful place for your human 
> generated data. Specializing in; Security, archiving and compliance. To find 
> out more visit the Mimecast website.


Indexing huge data onto solr

2020-05-22 Thread Srinivas Kashyap
Hi All,

We are runnnig solr 8.4.1. We have a database table which has more than 100 
million of records. Till now we were using DIH to do full-import on the tables. 
But for this table, when we do full-import via DIH it is taking more than 3-4 
days to complete and also it consumes fair bit of JVM memory while running.

Are there any speedier/alternates ways to load data onto this solr core.

P.S: Only initial data import is problem, further updates/additions to this 
core is being done through SolrJ.

Thanks,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: How upgrade to Solr 8 impact performance

2020-04-23 Thread Srinivas Kashyap
Can you share with details, what performance was degraded?

Thanks,
srinivas
From: Natarajan, Rajeswari 
Sent: 23 April 2020 12:41
To: solr-user@lucene.apache.org
Subject: Re: How upgrade to Solr 8 impact performance

With the same hardware and configuration we also saw performance degradation 
from 7.6 to 8.4.1 as this is why we are checking here to see if anyone else saw 
this behavior.

-Rajeswari

On 4/22/20, 7:16 AM, "Paras Lehana" 
mailto:paras.leh...@indiamart.com>> wrote:

Hi Rajeswari,

I can only share my experience of moving from Solr 6 to Solr 8. I suggest
you to move and then reevaluate your performance metrics. To recall another
experience, we moved from Java 8 to 11 for Solr 8.

Please note experiences can differ! :)

On Wed, 22 Apr 2020 at 00:50, Natarajan, Rajeswari <
rajeswari.natara...@sap.com> wrote:

> Any other experience from solr 7 to sol8 upgrade performance .Please
> share.
>
> Thanks,
> Rajeswari
>
> On 4/15/20, 4:00 PM, "Paras Lehana" 
> mailto:paras.leh...@indiamart.com>> wrote:
>
> In January, we upgraded Solr from version 6 to 8 skipping all versions
> in
> between.
>
> The hardware and Solr configurations were kept the same but we still
> faced
> degradation in response time by 30-50%. We had exceptional Query times
> around 25 ms with Solr 6 and now we are hovering around 36 ms.
>
> Since response times under 50 ms are very good even for Auto-Suggest,
> we
> have not tried any changes regarding this. Nevertheless, you can try
> using
> Caffeine Cache. Looking forward to read community inputs as well.
>
>
>
> On Thu, 16 Apr 2020 at 01:34, ChienHuaWang 
> mailto:chien-hua.w...@sap.com>>
> wrote:
>
> > Do anyone have experience to upgrade the application with Solr 7.X
> to 8.X?
> > How's the query performance?
> > Found out a little slower response time from application with Solr8
> based
> > on
> > current measurement, still looking into more detail it.
> > But wondering is any one have similar experience? is that something
> we
> > should expect for Solr 8.X?
> >
> > Please kindly share, thanks.
> >
> > Regards,
> > ChienHua
> >
> >
> >
> > --
> > Sent from:
> https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *1196*
>
> --
> *
> *
>
> >
>
>

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

--
*
*

>

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


TikaEntityProcessor with DIH

2020-04-20 Thread Srinivas Kashyap
Hi,

we were in Solr 5.2.1 and TikaEntityProcessor to index pdf documents through 
DIH and was working fine. The jars were tika-core-1.4.jar and 
tika-parsers-1.4.jar.

Below is my schema.xml: (p,s. All filed types have been defined)


   
   
   
   
   
   
   

And my tika-data-config.xml:





 
 
 
 









Now we have upgraded to solr-8.4.1 and when I try to put the above jars and 
index, I see only below are getting indexed:

{
"fileName":"01 - System-Wide Functions.pdf",
"size":"2524884",
"lastmodified":"Mon Jul 15 06:26:52 UTC 2019",
"path":"D:\\tssindex\\server\\solr\\help\\help\\01 - System-Wide 
Functions.pdf",
"text":"",
"_version_":1664474933885927424},
{

As you can see, the text field is empty & author, title fields are not getting 
indexed and any search on that text field is not returning the documents.

Please help me in this regard.


Thanks,
Srinivas



DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: OutOfMemory error solr 8.4.1

2020-03-09 Thread Srinivas Kashyap
Hi Erick,

Yes you were right, in my custom jar I'm using HttpSolrClient as below:

HttpSolrClient  client = new HttpSolrClient.Builder("http://; + server + ":" + 
port + "/" + webapp + "/").build();  
 try {
 client.request(new QueryRequest(params),coreName);
 } catch (SolrServerException e1) {
 // TODO Auto-generated catch block
 e1.printStackTrace();
 } catch (IOException e1) {
 // TODO Auto-generated catch block
 e1.printStackTrace();
 }

And strangely, I'm not closing the connection(client.close()). The same code 
would work without creating heaps of threads in version 5.2.1. Now after I 
added finally block and closed the connection, the threads have stopped growing 
in size.

finally{
if(client!=null)
{
try {
client.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
    }
    }

Thanks and Regards,
Srinivas Kashyap

From: Erick Erickson  
Sent: 09 March 2020 21:13
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error solr 8.4.1

I’m 99% certain that something in your custom jar is the culprit, otherwise 
we’d have seen a _lot_ of these. TIMED_WAITING is usually just a listener 
thread, but they shouldn’t be generated when SOlr is just sitting there. 

The first thing I’d do is dummy out my custom code or remove it completely and 
see. If you don’t have this thread explosion, then it’s pretty certainly your 
custom code.

Best,
Erick

> On Mar 9, 2020, at 01:29, Srinivas Kashyap  
> wrote:
> 
> Hi Erick,
> 
> I recompiled my custom code with 8.4.1 jars and placed back my jar in the lib 
> folder. Under Solr admin console/Thread Dump, I'm seeing a lot of below 
> threads which are in TIMED_WAITING stage.
> 
> Connection evictor (999)
>java.lang.Thread.sleep​(Native Method)
>
> org.apache.http.impl.client.IdleConnectionEvictor$1.run​(IdleConnectionEvictor.java:66)
>java.lang.Thread.run​(Thread.java:748)
> 
> It's been 15 minutes since I restarted the solr, and I believe already 999 
> threads have started?? And everytime I refresh the console, I'm seeing jump :
> 
> Connection evictor (1106)
> java.lang.Thread.sleep​(Native Method)
> org.apache.http.impl.client.IdleConnectionEvictor$1.run​(IdleConnectionEvictor.java:66)
> java.lang.Thread.run​(Thread.java:748)
> 
> Thanks and Regards,
> Srinivas Kashyap
> 
> -Original Message-
> From: Erick Erickson  
> Sent: 06 March 2020 21:34
> To: solr-user@lucene.apache.org
> Subject: Re: OutOfMemory error solr 8.4.1
> 
> I assume you recompiled the jar file? re-using the same one compiled against 
> 5x is unsupported, nobody will be able to help until you recompile.
> 
> Once you’ve done that, if you still have the problem you need to take a 
> thread dump to see if your custom code is leaking threads, that’s my number 
> one suspect.
> 
> Best,
> Erick
> 
>> On Mar 6, 2020, at 07:36, Srinivas Kashyap  
>> wrote:
>> 
>> Hi Erick,
>> 
>> We have custom code which are schedulers to run delta imports on our cores 
>> and I have added that custom code as a jar and I have placed it on 
>> server/solr-webapp/WEB-INF/lib. Basically we are fetching the JNDI 
>> datasource configured in the jetty.xml(Oracle) and creating connection 
>> object. And after that in the finally block we are closing it too.
>> 
>> Never faced this issue while we were in solr5.2.1 version though. The same 
>> jar was placed there too.
>> 
>> Thanks,
>> Srinivas
>> 
>> On 06-Mar-2020 8:55 pm, Erick Erickson  wrote:
>> This one can be a bit tricky. You’re not running out of overall memory, but 
>> you are running out of memory to allocate stacks. Which implies that, for 
>> some reason, you are creating a zillion threads. Do you have any custom code?
>> 
>> You can take a thread dump and see what your threads are doing, and you 
>> don’t need to wait until you see the error. If you take a thread dump my 
>> guess is you’ll see the number of threads increase over time. If that’s the 
>> case, and if you have no custom code running, we need to see the thread dump.
>> 
>> Best,
>> Erick
>> 
>>>> On Mar 6, 2020, at 05:54, Srinivas Kashyap 
>>>>  wrote:
>>> 
>>> Hi All,
>>> 
>>> I have recently upgraded solr to 8.4.1 and have installed solr as service 
>>> in linux machine. Once I start my service, it will be up for 15-18hours and 
>>> suddenly stops without

RE: OutOfMemory error solr 8.4.1

2020-03-09 Thread Srinivas Kashyap
Hi Erick,

I recompiled my custom code with 8.4.1 jars and placed back my jar in the lib 
folder. Under Solr admin console/Thread Dump, I'm seeing a lot of below threads 
which are in TIMED_WAITING stage.

Connection evictor (999)
java.lang.Thread.sleep​(Native Method)

org.apache.http.impl.client.IdleConnectionEvictor$1.run​(IdleConnectionEvictor.java:66)
java.lang.Thread.run​(Thread.java:748)

It's been 15 minutes since I restarted the solr, and I believe already 999 
threads have started?? And everytime I refresh the console, I'm seeing jump :

Connection evictor (1106)
java.lang.Thread.sleep​(Native Method)
org.apache.http.impl.client.IdleConnectionEvictor$1.run​(IdleConnectionEvictor.java:66)
java.lang.Thread.run​(Thread.java:748)

Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Erick Erickson  
Sent: 06 March 2020 21:34
To: solr-user@lucene.apache.org
Subject: Re: OutOfMemory error solr 8.4.1

I assume you recompiled the jar file? re-using the same one compiled against 5x 
is unsupported, nobody will be able to help until you recompile.

Once you’ve done that, if you still have the problem you need to take a thread 
dump to see if your custom code is leaking threads, that’s my number one 
suspect.

Best,
Erick

> On Mar 6, 2020, at 07:36, Srinivas Kashyap  
> wrote:
> 
> Hi Erick,
> 
> We have custom code which are schedulers to run delta imports on our cores 
> and I have added that custom code as a jar and I have placed it on 
> server/solr-webapp/WEB-INF/lib. Basically we are fetching the JNDI datasource 
> configured in the jetty.xml(Oracle) and creating connection object. And after 
> that in the finally block we are closing it too.
> 
> Never faced this issue while we were in solr5.2.1 version though. The same 
> jar was placed there too.
> 
> Thanks,
> Srinivas
> 
> On 06-Mar-2020 8:55 pm, Erick Erickson  wrote:
> This one can be a bit tricky. You’re not running out of overall memory, but 
> you are running out of memory to allocate stacks. Which implies that, for 
> some reason, you are creating a zillion threads. Do you have any custom code?
> 
> You can take a thread dump and see what your threads are doing, and you don’t 
> need to wait until you see the error. If you take a thread dump my guess is 
> you’ll see the number of threads increase over time. If that’s the case, and 
> if you have no custom code running, we need to see the thread dump.
> 
> Best,
> Erick
> 
>> On Mar 6, 2020, at 05:54, Srinivas Kashyap  
>> wrote:
>> 
>> Hi All,
>> 
>> I have recently upgraded solr to 8.4.1 and have installed solr as service in 
>> linux machine. Once I start my service, it will be up for 15-18hours and 
>> suddenly stops without us shutting down. In solr.log I found below error. 
>> Can somebody guide me what values should I be increasing in Linux machine?
>> 
>> Earlier, open file limit was not set and now I have increased. Below are my 
>> system configuration for solr:
>> 
>> JVM memory: 8GB
>> RAM: 32GB
>> Open file descriptor count: 50
>> 
>> Ulimit -v - unlimited
>> Ulimit -m - unlimited
>> 
>> 
>> ERROR STACK TRACE:
>> 
>> 2020-03-06 12:08:03.071 ERROR (qtp1691185247-21) [   x:product] 
>> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: 
>> java.lang.OutOfMemoryError: unable to create new native thread
>>   at 
>> org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:752)
>>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:603)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>>   at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>>   at 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>>   at 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>>   at 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>>   at 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>>   at 
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>>   at 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>>   at 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>>   at 
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>>   at 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHan

Re: OutOfMemory error solr 8.4.1

2020-03-06 Thread Srinivas Kashyap
Hi Erick,

We have custom code which are schedulers to run delta imports on our cores and 
I have added that custom code as a jar and I have placed it on 
server/solr-webapp/WEB-INF/lib. Basically we are fetching the JNDI datasource 
configured in the jetty.xml(Oracle) and creating connection object. And after 
that in the finally block we are closing it too.

Never faced this issue while we were in solr5.2.1 version though. The same jar 
was placed there too.

Thanks,
Srinivas

On 06-Mar-2020 8:55 pm, Erick Erickson  wrote:
This one can be a bit tricky. You’re not running out of overall memory, but you 
are running out of memory to allocate stacks. Which implies that, for some 
reason, you are creating a zillion threads. Do you have any custom code?

You can take a thread dump and see what your threads are doing, and you don’t 
need to wait until you see the error. If you take a thread dump my guess is 
you’ll see the number of threads increase over time. If that’s the case, and if 
you have no custom code running, we need to see the thread dump.

Best,
Erick

> On Mar 6, 2020, at 05:54, Srinivas Kashyap  
> wrote:
>
> Hi All,
>
> I have recently upgraded solr to 8.4.1 and have installed solr as service in 
> linux machine. Once I start my service, it will be up for 15-18hours and 
> suddenly stops without us shutting down. In solr.log I found below error. Can 
> somebody guide me what values should I be increasing in Linux machine?
>
> Earlier, open file limit was not set and now I have increased. Below are my 
> system configuration for solr:
>
> JVM memory: 8GB
> RAM: 32GB
> Open file descriptor count: 50
>
> Ulimit -v - unlimited
> Ulimit -m - unlimited
>
>
> ERROR STACK TRACE:
>
> 2020-03-06 12:08:03.071 ERROR (qtp1691185247-21) [   x:product] 
> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: 
> java.lang.OutOfMemoryError: unable to create new native thread
>at 
> org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:752)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:603)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
>at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
>at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>at 
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>at org.eclipse.jetty.server.Server.handle(Server.java:505)
>at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
>at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
>at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
>at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
>at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
>at 
> org.eclipse.jetty.uti

OutOfMemory error solr 8.4.1

2020-03-06 Thread Srinivas Kashyap
]
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
 ~[jetty-util-9.4.19.v20190610.jar:9.4.19.v20190610]
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
 ~[jetty-util-9.4.19.v20190610.jar:9.4.19.v20190610]
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
 ~[jetty-util-9.4.19.v20190610.jar:9.4.19.v20190610]
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
 ~[jetty-util-9.4.19.v20190610.jar:9.4.19.v20190610]
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
 ~[jetty-util-9.4.19.v20190610.jar:9.4.19.v20190610]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]




Thanks and Regards,
Srinivas Kashyap


DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Solr datePointField facet

2020-02-25 Thread Srinivas Kashyap
Hi Paras,

PFB details:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://localhost:8983/tssindex/party: SolrCore is loading
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:560)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
at 
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)

SCHEMA FILE:







  

  







































  

  




  









  
  








  



  




  
  



  




  








  




  








  









   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   

   
   
   
   
   
   
   
   
   
   
   

   
   
   
   
   

   
   
   

   
   
   
   


   
   
   


   
   
   
   
   
   


   
   
   

   


   
   

   

   
   
   
   
   
   

   
   
   

   
   


   
   
   
   
   
   
   
   

   
   
   
   
   
   

   
   


   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

   

   
   
   
   

   
   
   
   
   
   
   
   
   

   



ID







  
   


































































































































































































Thanks and Regards,
Srinivas Kashyap

From: Paras Lehana 
Sent: 25 February 2020 16:33
To: solr-user@lucene.apache.org
Subject: Re: Solr datePointField facet

Hi Srinivas,

But still facing the same error.


The same error? Can you please

Solr datePointField facet

2020-02-25 Thread Srinivas Kashyap
Hi all,

I have a date field in my schema and I'm trying to facet on that field and 
getting below error:



This field I'm copying to text field(copyfield) as well.



Error:
Can't facet on a PointField without docValues

I tried adding like below:





And after the changes, I did full reindex of the core and restarted as well.

But still facing the same error. Can somebody please help.

Thanks,
Srinivas




DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Solr 8.4.1 error

2020-02-03 Thread Srinivas Kashyap
Sorry for the interruption, This error was due to wrong context path mentioned 
in solr-jetty-context.xml



And in jetty.xml it was referring /solr. So index was locked.

Thanks,
Srinivas

-Original Message-
From: Srinivas Kashyap  
Sent: 04 February 2020 11:04
To: solr-user@lucene.apache.org
Subject: RE: Solr 8.4.1 error

Hi Shawn,

I did delete the data folder of the core and also did in windows command: solr 
stop -all. I see only one solr server is running in this machine which gets 
started and stopped when I do so. To confirm, I even copied my folders to 
another system and tried there but facing same issue.

In solr-config.xml if I replace ${solr.lock.type:native} 
with ${solr.lock.type:single}. It starts without any error.

Please let me know how to find if other servers are running or if it is an 
issue with solr 8.4.1 version.

Thanks,
Srinivas

-Original Message-
From: Shawn Heisey 
Sent: 04 February 2020 02:24
To: solr-user@lucene.apache.org
Subject: Re: Solr 8.4.1 error

On 2/3/2020 5:16 AM, Srinivas Kashyap wrote:
> I'm trying to upgrade to solr 8.4.1 and facing below error while start up and 
> my cores are not being listed in solr admin screen. I need your help.



> Caused by: java.nio.channels.OverlappingFileLockException
>  at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.SharedFileLockTable.add(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) 
> ~[?:1.8.0_221]
>  at java.nio.channels.FileChannel.tryLock(Unknown
> Source) ~[?:1.8.0_221]

This appears to be saying that the index in that directory is already locked.  
Lucene can detect when the index is already locked by the same program that 
tries to lock it again, and it will say so when that happens.  The message did 
not indicate that it was the same program, so in this case, it is likely that 
you already have another copy of Solr running and that copy has the index 
directory locked.  You cannot access the same index directory from multiple 
copies of Solr unless you disable locking, and that would be a REALLY bad idea.

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Solr 8.4.1 error

2020-02-03 Thread Srinivas Kashyap
Hi Shawn,

I did delete the data folder of the core and also did in windows command: solr 
stop -all. I see only one solr server is running in this machine which gets 
started and stopped when I do so. To confirm, I even copied my folders to 
another system and tried there but facing same issue.

In solr-config.xml if I replace ${solr.lock.type:native} 
with ${solr.lock.type:single}. It starts without any error.

Please let me know how to find if other servers are running or if it is an 
issue with solr 8.4.1 version.

Thanks,
Srinivas

-Original Message-
From: Shawn Heisey 
Sent: 04 February 2020 02:24
To: solr-user@lucene.apache.org
Subject: Re: Solr 8.4.1 error

On 2/3/2020 5:16 AM, Srinivas Kashyap wrote:
> I'm trying to upgrade to solr 8.4.1 and facing below error while start up and 
> my cores are not being listed in solr admin screen. I need your help.



> Caused by: java.nio.channels.OverlappingFileLockException
>  at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.SharedFileLockTable.add(Unknown Source) 
> ~[?:1.8.0_221]
>  at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) 
> ~[?:1.8.0_221]
>  at java.nio.channels.FileChannel.tryLock(Unknown
> Source) ~[?:1.8.0_221]

This appears to be saying that the index in that directory is already locked.  
Lucene can detect when the index is already locked by the same program that 
tries to lock it again, and it will say so when that happens.  The message did 
not indicate that it was the same program, so in this case, it is likely that 
you already have another copy of Solr running and that copy has the index 
directory locked.  You cannot access the same index directory from multiple 
copies of Solr unless you disable locking, and that would be a REALLY bad idea.

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Solr 8.4.1 error

2020-02-03 Thread Srinivas Kashyap
Hello,

I'm trying to upgrade to solr 8.4.1 and facing below error while start up and 
my cores are not being listed in solr admin screen. I need your help.

2020-02-03 12:12:35.622 ERROR (coreContainerWorkExecutor-2-thread-1) [   ] 
o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup => 
org.apache.solr.common.SolrException: Unable to create core [businesscase]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313)
org.apache.solr.common.SolrException: Unable to create core [businesscase]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313)
 ~[?:?]
at 
org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:788) ~[?:?]
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
 ~[metrics-core-4.0.5.jar:4.0.5]
at java.util.concurrent.FutureTask.run(Unknown Source) 
~[?:1.8.0_221]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
 ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source) ~[?:1.8.0_221]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source) ~[?:1.8.0_221]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_221]
Caused by: org.apache.solr.common.SolrException
at org.apache.solr.core.SolrCore.(SolrCore.java:1072) 
~[?:?]
at org.apache.solr.core.SolrCore.(SolrCore.java:901) 
~[?:?]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292)
 ~[?:?]
... 7 more
Caused by: java.nio.channels.OverlappingFileLockException
at sun.nio.ch.SharedFileLockTable.checkList(Unknown Source) 
~[?:1.8.0_221]
at sun.nio.ch.SharedFileLockTable.add(Unknown Source) 
~[?:1.8.0_221]
at sun.nio.ch.FileChannelImpl.tryLock(Unknown Source) 
~[?:1.8.0_221]
at java.nio.channels.FileChannel.tryLock(Unknown Source) 
~[?:1.8.0_221]
at 
org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:126)
 ~[?:?]
at 
org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[?:?]
at 
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[?:?]
at 
org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105) 
~[?:?]
at 
org.apache.solr.core.SolrCore.isWriterLocked(SolrCore.java:757) ~[?:?]
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:778) 
~[?:?]
at org.apache.solr.core.SolrCore.(SolrCore.java:989) 
~[?:?]
at org.apache.solr.core.SolrCore.(SolrCore.java:901) 
~[?:?]
at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292)
 ~[?:?]
... 7 more

Any pointers would be helpful.

Thanks and regards,
Srinivas

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


RE: Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hi,
Hi,

1)Have you tried running _just_ your SQL queries to see how long they take to 
respond and whether it responds with the full result set of batches

The 9th request returns only 2 rows. This behaviour is happening for all the 
cores which have more than 8 SQL requests. But the same is working fine with 
AWS hosting. Really baffled.

Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Erick Erickson 
Sent: 31 July 2019 08:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Dataimport problem

This code is a little old, but should give you a place to start:

https://lucidworks.com/post/indexing-with-solrj/

As for DIH, my guess is that when you moved to Azure, your connectivity to the 
DB changed, possibly the driver Solr uses etc., and your SQL query in step 9 
went from, maybe, batching rows to returning the entire result set or similar 
weirdness. Have you tried running _just_ your SQL queries to see how long they 
take to respond and whether it responds with the full result set of batches?

Best,
Erick

> On Jul 31, 2019, at 10:18 AM, Srinivas Kashyap  
> wrote:
>
> Hi,
>
> 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> running an old version of Solr. Which one?
>
> We are using Solr 5.2.1(WAR based deployment so)
>
>
> 5) DIH is not actually recommended for production, more for exploration; you 
> may want to consider moving to a stronger architecture given the complexity 
> of your needs
>
> Can you please give pointers to look into, We are using DIH for production 
> and facing few issues. We need to start phasing out
>
>
> Thanks and Regards,
> Srinivas Kashyap
>
> -Original Message-
> From: Alexandre Rafalovitch 
> Sent: 31 July 2019 07:41 PM
> To: solr-user 
> Subject: Re: Dataimport problem
>
> A couple of things:
> 1) Solr on Tomcat has not been an option for quite a while. So, you must be 
> running an old version of Solr. Which one?
> 2) Compare that you have the same Solr config. In Admin UI, there will be all 
> O/S variables passed to the Java runtime, I would check them side-by-side
> 3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run a 
> subset (1?) of the queries and see the difference
> 4) Worst case, you may want to track this in between Solr and DB by using 
> network analyzer (e.g. Wireshark). That may show you the actual queries, 
> timing, connection issues, etc
> 5) DIH is not actually recommended for production, more for exploration; you 
> may want to consider moving to a stronger architecture given the complexity 
> of your needs
>
> Regards,
>   Alex.
>
> On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  
> wrote:
>>
>> Hello,
>>
>> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
>> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
>> requests will go till 9 and it gets hung for eternity.
>>
>> But the same setup, solr(tomcat) and postgres database works fine with AWS 
>> hosting.
>>
>> Am I missing some configuration? Please let me know.
>>
>> Thanks and Regards,
>> Srinivas Kashyap
>> 

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hi,

1) Solr on Tomcat has not been an option for quite a while. So, you must be 
running an old version of Solr. Which one?

We are using Solr 5.2.1(WAR based deployment so)


5) DIH is not actually recommended for production, more for exploration; you 
may want to consider moving to a stronger architecture given the complexity of 
your needs

Can you please give pointers to look into, We are using DIH for production and 
facing few issues. We need to start phasing out


Thanks and Regards,
Srinivas Kashyap
            
-Original Message-
From: Alexandre Rafalovitch  
Sent: 31 July 2019 07:41 PM
To: solr-user 
Subject: Re: Dataimport problem

A couple of things:
1) Solr on Tomcat has not been an option for quite a while. So, you must be 
running an old version of Solr. Which one?
2) Compare that you have the same Solr config. In Admin UI, there will be all 
O/S variables passed to the Java runtime, I would check them side-by-side
3) You can enable Dataimport(DIH) debug in Admin UI, so perhaps you can run a 
subset (1?) of the queries and see the difference
4) Worst case, you may want to track this in between Solr and DB by using 
network analyzer (e.g. Wireshark). That may show you the actual queries, 
timing, connection issues, etc
5) DIH is not actually recommended for production, more for exploration; you 
may want to consider moving to a stronger architecture given the complexity of 
your needs

Regards,
   Alex.

On Wed, 31 Jul 2019 at 10:04, Srinivas Kashyap  wrote:
>
> Hello,
>
> We are trying to run Solr(Tomcat) on Azure instance and postgres being the 
> DB. When I run full import(my core has 18 SQL queries), for some reason, the 
> requests will go till 9 and it gets hung for eternity.
>
> But the same setup, solr(tomcat) and postgres database works fine with AWS 
> hosting.
>
> Am I missing some configuration? Please let me know.
>
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.


Dataimport problem

2019-07-31 Thread Srinivas Kashyap
Hello,

We are trying to run Solr(Tomcat) on Azure instance and postgres being the DB. 
When I run full import(my core has 18 SQL queries), for some reason, the 
requests will go till 9 and it gets hung for eternity.

But the same setup, solr(tomcat) and postgres database works fine with AWS 
hosting.

Am I missing some configuration? Please let me know.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Search using filter query on multivalued fields

2019-05-03 Thread Srinivas Kashyap
Hi,

I have indexed data as shown below using DIH:

"INGREDIENT_NAME": [
  "EGG",
  "CANOLA OIL",
  "SALT"
],
"INGREDIENT_NO": [
  "550",
  "297",
  "314"
],
"COMPOSITION PERCENTAGE": [
  20,
  60,
  40
],

Similar to this, many other records are also indexed. These are multi-valued 
fields.

I have a requirement to search all the records which has ingredient name salt 
and it's composition percentage is more than 20.

How do I write a filter query for this?

P.S: I should only fetch records, whose Salt Composition percentage is more 
than 20 and not other percentages.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: multi-level Nested entities in dih

2019-04-30 Thread Srinivas Kashyap
Hi Alexandre,

Yes, the whole tree gets mapped to and returned as single flat document. When 
you search, it should return all the matching documents if it matches that 
nested field.

Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Alexandre Rafalovitch  
Sent: 30 April 2019 05:06 PM
To: solr-user 
Subject: Re: multi-level Nested entities in dih

DIH may not be able to do arbitrary nesting. And it is not recommended for 
complex production cases.

However, in general, you also have to focus on what your _search_ will look 
like. Amd only then think about the mapping.

For example, is that whole tree gets mapped to and returned as a single flat 
document of fields? Or gets mapped to multiple result documents?

Regards,
 Alex

On Tue, Apr 30, 2019, 6:29 AM Srinivas Kashyap, 
wrote:

> Hello,
>
> I'm using DIH to index the data using SQL. I have requirement as shown
> below:
>
> Parent entity
> Child1
> Child2
> Child3
> CHILD4( child41, child42, CHILD43(child
> 431,child432,child433,CHILD434...)
>
> How to recursively iterate the child entities which have some more 
> child entities in them until I'm done with all the children.
>
> Thanks and Regards,
> Srinivas Kashyap
> 
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender 
> immediately by replying to the e-mail, and then delete it without 
> making copies or using it in any way.
> No representation is made that this email or any attachments are free 
> of viruses. Virus scanning is recommended and is the responsibility of 
> the recipient.
>


multi-level Nested entities in dih

2019-04-30 Thread Srinivas Kashyap
Hello,

I'm using DIH to index the data using SQL. I have requirement as shown below:

Parent entity
Child1
Child2
Child3
CHILD4( child41, child42, CHILD43(child 
431,child432,child433,CHILD434...)

How to recursively iterate the child entities which have some more child 
entities in them until I'm done with all the children.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: Sql entity processor sortedmapbackedcache out of memory issue

2019-04-24 Thread Srinivas Kashyap
Hi Shawn, Mikhail

Any suggestions/pointers for using zipper algorithm. I'm facing below error.

Thanks and Regards,
Srinivas Kashyap
**

From: Srinivas Kashyap  
Sent: 12 April 2019 03:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Sql entity processor sortedmapbackedcache out of memory issue

Hi Shawn/Mikhail Khludnev,

I was going through Jira  https://issues.apache.org/jira/browse/SOLR-4799 and 
see, I can do my intended activity by specifying zipper.

I tried doing it, however I'm getting error as below:

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.IllegalArgumentException: expect increasing foreign keys for Relation 
CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782 at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: java.lang.IllegalArgumentException: expect increasing foreign keys 
for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782 at 
org.apache.solr.handler.dataimport.Zipper.supplyNextChild(Zipper.java:70)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:126)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)


Below is my dih config:















Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Shawn Heisey 
Sent: 09 April 2019 01:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue

On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below 
> for solr core:
>
> 
> 16 child entities
> 
>
> During indexing, since the number of requests being made to database was 
> high(to process one document 17 queries) and was utilizing most of 
> connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to 
> reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to store 
the results of the entire query in memory, even if you don't need all of the 
results.  If the database has a lot of rows, that's going to take a lot of 
memory.

In your excerpt from the config, your inner entity doesn't have a WHERE clause. 
 Which means that it's going to retrieve all of the rows of the ABC table for 
*EVERY* single entry in the DEF table.  That's going to be exceptionally slow.  
Normally the SQL query on inner entities will have some kind of WHERE clause 
that limits the results to rows that match the entry from the outer entity.

You may need to write a custom indexing program that runs separately from Solr, 
possibly on an entirely different server.  That might be a lot more efficient 
than DIH.

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: Sql entity processor sortedmapbackedcache out of memory issue

2019-04-12 Thread Srinivas Kashyap
Hi Shawn/Mikhail Khludnev,

I was going through Jira  https://issues.apache.org/jira/browse/SOLR-4799 and 
see, I can do my intended activity by specifying zipper.

I tried doing it, however I'm getting error as below:

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.IllegalArgumentException: expect increasing foreign keys for Relation 
CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: java.lang.IllegalArgumentException: expect increasing foreign keys 
for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
at org.apache.solr.handler.dataimport.Zipper.supplyNextChild(Zipper.java:70)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:126)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)


Below is my dih config:















Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Shawn Heisey 
Sent: 09 April 2019 01:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue

On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below 
> for solr core:
>
> 
> 16 child entities
> 
>
> During indexing, since the number of requests being made to database was 
> high(to process one document 17 queries) and was utilizing most of 
> connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to 
> reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to store 
the results of the entire query in memory, even if you don't need all of the 
results.  If the database has a lot of rows, that's going to take a lot of 
memory.

In your excerpt from the config, your inner entity doesn't have a WHERE clause. 
 Which means that it's going to retrieve all of the rows of the ABC table for 
*EVERY* single entry in the DEF table.  That's going to be exceptionally slow.  
Normally the SQL query on inner entities will have some kind of WHERE clause 
that limits the results to rows that match the entry from the outer entity.

You may need to write a custom indexing program that runs separately from Solr, 
possibly on an entirely different server.  That might be a lot more efficient 
than DIH.

Thanks,
Shawn

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Sql entity processor sortedmapbackedcache out of memory issue

2019-04-09 Thread Srinivas Kashyap
Hello,

I'm using DIH to index the data and the structure of the DIH is like below for 
solr core:


16 child entities


During indexing, since the number of requests being made to database was 
high(to process one document 17 queries) and was utilizing most of connections 
of database thereby blocking our web application.

To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to 
reduce the number of requests to database.












.
.
.
.
.
.
.


We have 8GB Physical memory system(RAM) with 5GB of it allocated to JVM and 
when we do full-import, only 17 requests are made to database. However, it is 
shooting up memory consumption and is making the JVM out of memory. Out of 
memory is happening depending on the number of records each entity is bringing 
in to the memory. For Dev and QA environments, the above memory config is 
sufficient. When we move to production, we have to increase the memory to 
around 16GB of RAM and 12 GB of JVM.

Is there any logic/configurations to limit the memory usage?

Thanks and Regards,
Srinivas Kashyap


DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Alternative for DIH

2019-01-31 Thread Srinivas Kashyap
Hello,

As we all know DIH is single threaded and has it's own issues while indexing.

Got to know that we can write our own API's to pull data from DB and push it 
into solr. One such I heard was Apache Kafka being used for the purpose.

Can any of you send me the links and guides to use apache kafka to pull data 
from DB and push into solr?

If there are any other alternatives please suggest.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


FW: Sort index by size

2018-11-27 Thread Srinivas Kashyap
Hi Shawn and everyone who replied to the thread,

The solr version is 5.2.1 and each document is returning multi-valued fields 
for majority of fields defined in schema.xml. I'm in the process of pasting the 
content of my files to a paste website and soon will update.

Thanks,
Srinivas


On 11/19/2018 2:31 AM, Srinivas Kashyap wrote:
> I have a solr core with some 20 fields in it.(all are stored and indexed). 
> For an environment, the number of documents are around 0.29 million. When I 
> run the full import through DIH, indexing is completing successfully. But, it 
> is occupying the disk space of around 5 GB. Is there a possibility where I 
> can go and check, which document is consuming more memory? Put in another 
> way, can I sort the index based on size?

I am not aware of any way to do that.  Might be one that I don't know about, 
but if there were a way, seems like I would have come across it before.

It is not very that the large index size is due to a single document or a 
handful of documents.  It is more likely that most documents are relatively 
large.  I could be wrong about that, though.

If you have 29 documents (which is how I interpreted 0.29 million) and the 
total index size is about 5 GB, then the average size per document in the index 
is about 18 kilobytes.This is in my view pretty large.  Typically I think that 
most documents are 1-2 kilobytes.

Can we get your Solr version, a copy of your schema, and exactly what Solr 
returns in search results for a typically sized document?  You'll need to use a 
paste website or a file-sharing website ... if you try to attach these things 
to a message, the mailing list will most likely eat them, and we'll never see 
them. If you need to redact the information in search results ... please do it 
in a way that we can still see the exact size of the text -- don't just remove 
information, replace it with information that's the same length.

Thanks,
Shawn


DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Sort index by size

2018-11-19 Thread Srinivas Kashyap
Hello,

I have a solr core with some 20 fields in it.(all are stored and indexed). For 
an environment, the number of documents are around 0.29 million. When I run the 
full import through DIH, indexing is completing successfully. But, it is 
occupying the disk space of around 5 GB. Is there a possibility where I can go 
and check, which document is consuming more memory? Put in another way, can I 
sort the index based on size?

Thanks and Regards,
Srinivas Kashyap

  
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Jetty Sqlserver config

2018-09-20 Thread Srinivas Kashyap
Hello,

I'm having problem in setting up SQL server data import handler for Jetty 
container.

in data config xml I have set up jndi as below:





And I have jetty-env.xml inside WEB-INF as below:


 
 java:comp/env/jdbc/tssindex
 

   
   
   
   
   1433

 


In Web.xml I have below:


 MyDB datasource reference
 java:comp/env/jdbc/tssindex
 javax.sql.DataSource
 Container



Once I start importing, I'm getting below exception:

javax.naming.NameNotFoundException; remaining name 'env/jdbc/tssindex'
at 
org.eclipse.jetty.jndi.NamingContext.lookup(NamingContext.java:540)
at 
org.eclipse.jetty.jndi.NamingContext.lookup(NamingContext.java:571)


Please guide/correct me the right approach for configuring JNDI for SQL server 
database.


Thanks and Regards,
Srinivas Kashyap


Solr upgrade issues

2018-09-07 Thread Srinivas Kashyap
Hi,

We are in the process of upgrading Solr from solr 5.2.1 to solr 7.4.0 and I'm 
facing below issues. Please help me in resolving.

1)HttpSolrClient tempClient = new 
HttpSolrClient.Builder("http://localhost:8983/solr;).build();

ModifiableSolrParams params = new 
ModifiableSolrParams();
   params.set("qt", "/dataimport");
   params.set("command", "delta-import");
   params.set("clean", "false");
   params.set("commit", "true");
   params.set("entity", "collection1");

GenericSolrRequest req = new 
GenericSolrRequest(SolrRequest.METHOD.POST, "/select" ,params);
   tempClient.request(req," collection1");


In the back end I have schedulers which call delta-import on my collection1. 
This used to work before in 5.2.1 and when this code gets executed, I could see 
the delta-import was being run. Now in 7.4.0, I'm not able to see the 
delta-import running. Does solr restricts taking external requests in 7.4.0?

2) Also in Jetty.xml , I have configured the datasource as below


  
 java:comp/env/jdbc/tssindex
 
 
 thin
 jdbc:oracle:thin:@localhost:1521:xe
 X
 X
 
 
 

How to fetch the data source configured in this xml 
(java:comp/env/jdbc/tssindex)

Earlier I used to fetch from tomcat context xml as below:

Context initContext = new InitialContext();
DataSource ds = null;
Context webContext = 
(Context)initContext.lookup("java:/comp/env");
ds = (DataSource) webContext.lookup("jdbc/tssindex");

How to fetch it in jetty.

Thanks in advance,
Srinivas kashyap



RE: API to convert solr response to Rowset

2018-07-18 Thread Srinivas Kashyap
I have a collection and thru solrJ, Ii query the collection and get 
QueryResponse SolrJ object. Is there a way I can convert this query response to 
Rowset(JDBC). I see the parallel SQL interface is introduced in 7.x version, 
but is it possible in Solr 5.2.1?

Thanks and Regards,
Srinivas Kashyap
             
 
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: 18 July 2018 04:25 PM
To: solr-user 
Subject: Re: API to convert solr response to Rowset

Do you mean to use it with JDBC like source?

https://lucene.apache.org/solr/guide/7_4/parallel-sql-interface.html

Regards,
Alex

On Wed, Jul 18, 2018, 2:15 AM Srinivas Kashyap, < 
srini...@tradestonesoftware.com> wrote:

> Hello,
>
> Is there any API to convert Solr query response to JDBC Rowset?
>
> Thanks and Regards,
> Srinivas Kashyap
>
>


API to convert solr response to Rowset

2018-07-18 Thread Srinivas Kashyap
Hello,

Is there any API to convert Solr query response to JDBC Rowset?

Thanks and Regards,
Srinivas Kashyap



Upgrading to higher version

2018-07-17 Thread Srinivas Kashyap
Hello,

We are currently in Solr 5.2.1 and we build our project thru ant and bundle as 
WAR file. Now, we want to upgrade solr to latest version(7.x)

Can you please let me know, what is the stable release? And will I still be 
able to ship our application as war file? Any guide/steps do the same is much 
appreciated.

Also, our application was running on tomcat/websphere, will it be any different 
in Jetty?

Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
"GURUDAS HERITAGE"
100 Feet Ring Road,
Kadirenahalli,
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
[cid:image001.png@01D41DE8.DD5DFED0]<https://www.bamboorose.com/>
[cid:image002.png@01D41DE8.DD5DFED0] 
<http://www.facebook.com/BambooRoseCommunity>
[cid:image003.png@01D41DE8.DD5DFED0] <https://www.linkedin.com/company/2814733> 
   [cid:image004.png@01D41DE8.DD5DFED0] <https://twitter.com/GoBambooRose>

Leading Retail Platform to discover, develop, and deliver products @ consumer 
speed.



Solr facet on facet field returns junk values

2018-05-16 Thread Srinivas Kashyap
Hello,

I have a Solr collection which has around 20 fields(indexed and stored). When I 
turn on facet and mention a facet.field, I'm able to get the facet count for 
that field. However, I'm able to see some junk facet counts like below, being 
generated in the facet response(1, 10, 100, 1000, 10 ...). These junk 
values have count 0. However, when I query(*:*) and turn on facet.field, these 
junk values doesn't show up since the actual count values would be more. Any 
reason why is it happening?

"PHY_KEY1": [
"85",
11,
"400",
10,
"218",
9,
"965",
9,
"640",
5,
"26",
3,
"465",
3,
"292",
2,
"158",
1,
"267",
1,
"38",
1,
"00176",
0,
"1",
0,
"10",
0,
"100",
0,
"1000",
0,
"10",
0,
"101",
0,
"103",
0,
"104",
0,
"105",
0,
"107",
0,
"108",
0,
"109",
0,
"11",
0,
"110",
0,
"115",
0,
"1000017",
0,
"12",
0,
"120",
0,
"122",
0,
"123",
0,
"124",
0,
"125",
0,
"126",
0,
"127",


Thanks and Regards,
Srinivas Kashyap


count mismatch: number of records indexed

2018-05-02 Thread Srinivas Kashyap
Hi,

I have standalone solr index server 5.2.1 and have a core with 15 fields(all 
indexed and stored).

Through DIH I'm indexing the data (around 65million records). The index process 
took 6hours to complete. But after the completion when I checked through Solr 
admin query console(*:*), numfound is only 41 thousand records. Am I missing 
some configuration to index all records?

Physical memory: 16GB
JVM memory: 4GB

Thanks,
Srinivas


Solr performance issue

2018-02-15 Thread Srinivas Kashyap
Hi,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. And i'm using the same for full-import only. 
And in the beginning of my implementation, i had written delta-import query to 
index the modified changes. But my requirement grew and i have 17 child 
entities for a single parent entity now. When doing delta-import for huge data, 
the number of requests being made to datasource(database)  became more and CPU 
utilization was 100% when concurrent users started modifying the data. For this 
instead of calling delta-import which imports based on last index time, I did 
full-import('SortedMapBackedCache' ) based on last index time.

Though the parent entity query would return only records that are modified, the 
child entity queries pull all the data from the database and the indexing 
happens 'in-memory' which is causing the JVM memory go out of memory.

Is there a way to specify in the child query entity to pull the record related 
to parent entity in the full-import mode.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Database logins and active sessions

2018-02-07 Thread Srinivas Kashyap
Hello,

We have configured Solr index server on tomcat and fetch the data from database 
to index the data. We have implemented delta query indexing based on modify_ts.

In our data-config.xml we have a parent entity and 17 child entity. We have 18 
such solr cores. When we call delta-import on a core, it executes 18 SQL query 
to query database.

Each time delta-import is opening a new session onto database. Log-in and 
log-out though happening at a split second, we are finding millions of login 
and logout at database.

As per our DBA, login and logout are costly operation in terms of server 
resources.

Is there a way to reduce the number of  logins and logouts and have a 
persistent DB connection from solr?

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

OnImportEnd EventListener

2018-01-31 Thread Srinivas Kashyap
Hello,

I'm trying to get the documents which got indexed on calling DIH and I want to 
differentiate such documents with the ones which are added using SolrJ atomic 
update.

Is it possible to get the document primary keys which got indexed thru 
"onImportEnd" Eventlistener?

Any alternative way I can find them?

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: OnImportEnd EventListener

2018-01-31 Thread Srinivas Kashyap
Hi Emir,

Thanks for the reply,

As I'm doing atomic update on the existing documents(already indexed from DIH) 
as well, with the suggested approach, I might end up doing atomic update on DIH 
imported document and commit the same.

So, I wanted to get the document values which were indexed when import was 
completed("onImportEnd" eventlistener).

Thanks and Regards,
Srinivas Kashyap



-Original Message-
From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] 
Sent: 31 January 2018 04:14 PM
To: solr-user@lucene.apache.org
Subject: Re: OnImportEnd EventListener

Hi Srinivas,
I guess you can add some field that will be set in your DIH config - something 
like:


And you can use ‘dih’ field to filter out doc that are imported using DIH.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch 
Consulting Support Training - http://sematext.com/



> On 31 Jan 2018, at 11:19, Srinivas Kashyap <srini...@tradestonesoftware.com> 
> wrote:
> 
> Hello,
> 
> I'm trying to get the documents which got indexed on calling DIH and I want 
> to differentiate such documents with the ones which are added using SolrJ 
> atomic update.
> 
> Is it possible to get the document primary keys which got indexed thru 
> "onImportEnd" Eventlistener?
> 
> Any alternative way I can find them?
> 
> Thanks and Regards,
> Srinivas Kashyap
> 
> 
> DISCLAIMER: 
> E-mails and attachments from TradeStone Software, Inc. are confidential.
> If you are not the intended recipient, please notify the sender 
> immediately by replying to the e-mail, and then delete it without 
> making copies or using it in any way. No representation is made that 
> this email or any attachments are free of viruses. Virus scanning is 
> recommended and is the responsibility of the recipient.


DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.


OnImportEnd EventListener

2018-01-31 Thread Srinivas Kashyap
Hello,

I'm trying to get the documents which got indexed on calling DIH and I want to 
differentiate such documents with the ones which are added using SolrJ atomic 
update.

Is it possible to get the document primary keys which got indexed thru 
"onImportEnd" Eventlistener?

Any alternative way I can find them?

Thanks and Regards,
Srinivas Kashyap


DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Connection rest by peer error

2017-08-16 Thread Srinivas Kashyap
Hello,

In one of our tomcat based solr 5.2.1 deployment environment, we are 
experiencing the below error which is occurring recursively.

When the app is restarted, the error won't show up until sometime. Max allowed 
connections in tomcat context xml and JVM memory of solr is sufficient enough.

14-Aug-2017 09:59:34.515 SEVERE [http-nio--exec-7] 
org.apache.solr.common.SolrException.log 
null:org.apache.catalina.connector.ClientAbortException: java.io.IOException: 
Connection reset by peer
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:393)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:426)
at 
org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:342)
at 
org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:317)
at 
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:110)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:54)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:727)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at 
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:522)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1095)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:672)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1502)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1458)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:124)
at 
org.apache.tomcat.util.net.NioBlockingSelector.write(NioBlockingSelector.java:101)
at 
org.apache.tomcat.util.net.NioSelectorPool.write(NioSelectorPool.java:172)
at 
org.apache.coyote.http11.InternalNioOutputBuffer.writeToSocket(InternalNioOutputBuffer.java:139)
at 
org.apache.coyote.http11.InternalNioOutputBuffer.addToBB(InternalNioOutputBuffer.java:197)
at 
org.apache.coyote.http11.InternalNioOutputBuffer.access$000(InternalNioOutputBuffer.java:41)
at 
org.apache.coyote.http11.InternalNioOutputBuffer$SocketOutputBuffer.doWrite(InternalNioOutputBuffer.java:320)
at 
org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:118)
at 
org.apache.coyote.http11.AbstractOutputBuffer.doWrite(AbstractOutputBuffer.java:256)
at org.apache.coyote.Response.doWrite(Response.java:501)
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:388)
... 31 more

Please guide me what should I be looking at.


DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you 

RE: How to synchronize the imports (DIH) delta imports

2017-06-21 Thread Srinivas Kashyap
Thanks Mikhail,

Can you please explain the same? How can it be done in SolrJ

Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
“GURUDAS HERITAGE”
100 Feet Ring Road,
Kadirenahalli,
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
Bamboo Rose
The only B2B marketplace powered by proven trade engines.
www.BambooRose.com

Make Retail. Fun. Connected. Easier. Smarter. Together. Better.

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org]
Sent: 21 June 2017 11:57 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: How to synchronize the imports (DIH) delta imports

Hello, Srinivas.

You can literally poll import status.

On Wed, Jun 21, 2017 at 7:41 AM, Srinivas Kashyap <srini...@bamboorose.com>
wrote:

> Hello,
>
> We have our architecture of index server, where delta-imports run
> periodically based on modify_ts of the records.
>
> We have another adhoc import handler on each core where import is
> called based on the key of solr core. This adhoc import is also called
> periodically.
>
> We have scenario where multiple records are picked up for the adhoc
> import and the index server starts indexing them sequentially. At the
> subsequent time, if another adhoc import command is called, the
> records are not being
> indexed(skipped) as the solr core is busy re-indexing the earlier records.
>
> Is there a way we can poll the import status of index server in SolrJ,
> so that we can refrain sending another adhoc import command while the
> index is still runnning?
>
> Thanks and Regards,
> Srinivas Kashyap
> Senior Software Engineer
> "GURUDAS HERITAGE"
> 100 Feet Ring Road,
> Kadirenahalli,
> Banashankari 2nd Stage,
> Bangalore-560070
> P:  973-986-6105
> Bamboo Rose
> The only B2B marketplace powered by proven trade engines.
> www.BambooRose.com<http://www.bamboorose.com/>
>
> Make Retail. Fun. Connected. Easier. Smarter. Together. Better.
>
> 
>
> DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are
> confidential. If you are not the intended recipient, please notify the
> sender immediately by replying to the e-mail, and then delete it
> without making copies or using it in any way. No representation is
> made that this email or any attachments are free of viruses. Virus
> scanning is recommended and is the responsibility of the recipient.
>



--
Sincerely yours
Mikhail Khludnev


DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


How to synchronize the imports (DIH) delta imports

2017-06-21 Thread Srinivas Kashyap
Hello,

We have our architecture of index server, where delta-imports run periodically 
based on modify_ts of the records.

We have another adhoc import handler on each core where import is called based 
on the key of solr core. This adhoc import is also called periodically.

We have scenario where multiple records are picked up for the adhoc import and 
the index server starts indexing them sequentially. At the subsequent time, if 
another adhoc import command is called, the records are not being 
indexed(skipped) as the solr core is busy re-indexing the earlier records.

Is there a way we can poll the import status of index server in SolrJ, so that 
we can refrain sending another adhoc import command while the index is still 
runnning?

Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
"GURUDAS HERITAGE"
100 Feet Ring Road,
Kadirenahalli,
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
Bamboo Rose
The only B2B marketplace powered by proven trade engines.
www.BambooRose.com<http://www.bamboorose.com/>

Make Retail. Fun. Connected. Easier. Smarter. Together. Better.



DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-18 Thread Srinivas Kashyap
Hi,



We have not set the autosoftcommit in solrcofig.xml. The only commit we are 
doing is through DIH(assuming it commits after the import).



Also we have written timely schedulers to check if any records/documents is 
updated in database and to trigger the re-index of solr on those updated 
documents.



Below are some more config details in solrconfig.xml











20

200

false

2



Thanks and Regards,

Srinivas Kashyap



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 17 May 2017 08:51 PM
To: solr-user <solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
Subject: Re: Performance warning: Overlapping onDeskSearchers=2 solr



Also, what is your autoSoftCommit setting? That also opens up a new searcher.



On Wed, May 17, 2017 at 8:15 AM, Jason Gerlowski 
<gerlowsk...@gmail.com<mailto:gerlowsk...@gmail.com>> wrote:

> Hey Shawn, others.

>

> This is a pitfall that Solr users seem to run into with some

> frequency.  (Anecdotally, I've bookmarked the Lucidworks article you

> referenced because I end up referring people to it often enough.)

>

> The immediate first advice when someone encounters these

> onDeckSearcher error messages is to examine their commit settings.  Is

> there any other possible cause for those messages?  If not, can we

> consider changing the log/exception error message to be more explicit

> about the cause?

>

> A strawman new message could be: "Performance warning: Overlapping

> onDeskSearchers=2; consider reducing commit frequency if performance

> problems encountered"

>

> Happy to create a JIRA/patch for this; just wanted to get some

> feedback first in case there's an obvious reason the messages don't

> get explicit about the cause.

>

> Jason

>

> On Wed, May 17, 2017 at 8:49 AM, Shawn Heisey 
> <apa...@elyograg.org<mailto:apa...@elyograg.org>> wrote:

>> On 5/17/2017 5:57 AM, Srinivas Kashyap wrote:

>>> We are using Solr 5.2.1 version and are currently experiencing below 
>>> Warning in Solr Logging Console:

>>>

>>> Performance warning: Overlapping onDeskSearchers=2

>>>

>>> Also we encounter,

>>>

>>> org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
>>> limit of maxWarmingSearchers=2, try again later.

>>>

>>>

>>> The reason being, we are doing mass update on our application and solr 
>>> experiencing the higher loads at times. Data is being indexed using DIH(sql 
>>> queries).

>>>

>>> In solrconfig.xml below is the code.

>>>

>>> 

>>>

>>> Should we be uncommenting the above lines and try to avoid this error? 
>>> Please help me.

>>

>> This warning means that you are committing so frequently that there

>> are already two searchers warming when you start another commit.

>>

>> DIH does a commit exactly once -- at the end of the import.  One import will 
>> not cause the warning message you're seeing, so if there is one import 
>> happening at a time, either you are sending explicit commit requests during 
>> the import, or you have autoSoftCommit enabled with values that are far too 
>> small.

>>

>> You should definitely have autoCommit configured, but I would remove

>> maxDocs and set maxTime to something like 6 -- one minute.  The

>> autoCommit should also set openSearcher to false.  This kind of

>> commit will not make new changes visible, but it will start a new

>> transaction log frequently.

>>

>>

>>  6

>>  false

>>

>>

>> An automatic commit (soft or hard) with a one second interval is going to 
>> cause that warning you're seeing.

>>

>> https://lucidworks.com/understanding-transaction-logs-softcommit-and-

>> commit-in-sorlcloud/

>>

>> Thanks,

>> Shawn

>>





DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
"GURUDAS HERITAGE"
100 Feet Ring Road,
Kadirenahalli,
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
Bamboo Rose
The only B2B marketplace powered by proven trade engines.
www.BambooRose.com<http://www.bamboorose.com/>

Make Retail. Fun. Connected. Easier. Smarter. Together. Better.



DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


RE: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-18 Thread Srinivas Kashyap
Hi,

We have not set the autosoftcommit in solrcofig.xml. The only commit we are 
doing is through DIH(assuming it commits after the import).

Also we have written timely schedulers to check if any records/documents is 
updated in database and to trigger the re-index of solr on those updated 
documents.

Below are some more config details in solrconfig.xml





20
200
false
2

Thanks and Regards,
Srinivas Kashyap

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 17 May 2017 08:51 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Performance warning: Overlapping onDeskSearchers=2 solr

Also, what is your autoSoftCommit setting? That also opens up a new searcher.

On Wed, May 17, 2017 at 8:15 AM, Jason Gerlowski 
<gerlowsk...@gmail.com<mailto:gerlowsk...@gmail.com>> wrote:
> Hey Shawn, others.
>
> This is a pitfall that Solr users seem to run into with some
> frequency.  (Anecdotally, I've bookmarked the Lucidworks article you
> referenced because I end up referring people to it often enough.)
>
> The immediate first advice when someone encounters these
> onDeckSearcher error messages is to examine their commit settings.  Is
> there any other possible cause for those messages?  If not, can we
> consider changing the log/exception error message to be more explicit
> about the cause?
>
> A strawman new message could be: "Performance warning: Overlapping
> onDeskSearchers=2; consider reducing commit frequency if performance
> problems encountered"
>
> Happy to create a JIRA/patch for this; just wanted to get some
> feedback first in case there's an obvious reason the messages don't
> get explicit about the cause.
>
> Jason
>
> On Wed, May 17, 2017 at 8:49 AM, Shawn Heisey 
> <apa...@elyograg.org<mailto:apa...@elyograg.org>> wrote:
>> On 5/17/2017 5:57 AM, Srinivas Kashyap wrote:
>>> We are using Solr 5.2.1 version and are currently experiencing below 
>>> Warning in Solr Logging Console:
>>>
>>> Performance warning: Overlapping onDeskSearchers=2
>>>
>>> Also we encounter,
>>>
>>> org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
>>> limit of maxWarmingSearchers=2, try again later.
>>>
>>>
>>> The reason being, we are doing mass update on our application and solr 
>>> experiencing the higher loads at times. Data is being indexed using DIH(sql 
>>> queries).
>>>
>>> In solrconfig.xml below is the code.
>>>
>>> 
>>>
>>> Should we be uncommenting the above lines and try to avoid this error? 
>>> Please help me.
>>
>> This warning means that you are committing so frequently that there
>> are already two searchers warming when you start another commit.
>>
>> DIH does a commit exactly once -- at the end of the import.  One import will 
>> not cause the warning message you're seeing, so if there is one import 
>> happening at a time, either you are sending explicit commit requests during 
>> the import, or you have autoSoftCommit enabled with values that are far too 
>> small.
>>
>> You should definitely have autoCommit configured, but I would remove
>> maxDocs and set maxTime to something like 6 -- one minute.  The
>> autoCommit should also set openSearcher to false.  This kind of
>> commit will not make new changes visible, but it will start a new
>> transaction log frequently.
>>
>>
>>  6
>>  false
>>
>>
>> An automatic commit (soft or hard) with a one second interval is going to 
>> cause that warning you're seeing.
>>
>> https://lucidworks.com/understanding-transaction-logs-softcommit-and-
>> commit-in-sorlcloud/
>>
>> Thanks,
>> Shawn
>>


DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

  

DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Srinivas Kashyap
Hi All,

We are using Solr 5.2.1 version and are currently experiencing below Warning in 
Solr Logging Console:

Performance warning: Overlapping onDeskSearchers=2

Also we encounter,

org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
limit of maxWarmingSearchers=2,​ try again later.


The reason being, we are doing mass update on our application and solr 
experiencing the higher loads at times. Data is being indexed using DIH(sql 
queries).

In solrconfig.xml below is the code.



Should we be uncommenting the above lines and try to avoid this error? Please 
help me.

Thanks and Regards,
Srinivas Kashyap



DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Broken pipe error

2016-11-29 Thread Srinivas Kashyap
Hello,

After starting the solr application and running full imports, running into this 
below error after a while:

null:org.apache.catalina.connector.ClientAbortException: java.io.IOException: 
Broken pipe
at 
org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:393)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:426)
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:342)
at org.apache.catalina.connector.OutputBuffer.flush(OutputBuffer.java:317)
at 
org.apache.catalina.connector.CoyoteOutputStream.flush(CoyoteOutputStream.java:110)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:430)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at org.apache.catalina


Can somebody guide me how to resolve this issue?

Some of the parameters for Tomcat set are :

maxWait="15000" maxActive="1000" maxIdle="50".

Thanks and Regards,
Srinivas

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Trim trailing whitespaces

2016-04-12 Thread Srinivas Kashyap
Hi,

When i index the data, the data is coming with trailing whitespaces.

How should i remove them? In schema.xml fieldtype for below fields are 
"string". Please suggest.

"response": {
"numFound": 40327,
"start": 0,
"docs": [
  {
"TECHSPEC.REQUEST_NO": "HQ22   ",
"TECH_SPEC_ID": "HQ22   ",
"DOCUMENT_TYPE": "TECHSPEC",
"TECHSPEC.OWNER": "SHOP ",
"timestamp": "2016-04-13T05:01:58.408Z"
  },


Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
"GURUDAS HERITAGE"
'Block A' , No 59/2, 2nd Floor, 100 Feet Ring Road,
Kadirenahalli, Padmanabhanagar
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
Bamboo Rose
The only B2B marketplace powered by proven trade engines.
www.BambooRose.com<http://www.bamboorose.com/>

Make Retail. Fun. Connected. Easier. Smarter. Together. Better.


DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Curious case of DataSource.getConnection()

2016-04-12 Thread Srinivas Kashyap
Hi,

In a Solr scheduler class which runs every 'n' interval of seconds, i'm polling 
a database table to do some custom job.

I'm getting the connection to database, through context file as below:

try {
 Context initContext = new InitialContext();
 DataSource ds = null;
 if ("tomcat".equals(p.getProperty("server.type")))
 {
   Context webContext = 
(Context)initContext.lookup("java:/comp/env");
   ds = (DataSource) 
webContext.lookup("");
 }
 else if ("ws".equals(p.getProperty("server.type"))) 
//websphere
 {
   ds = (DataSource) 
initContext.lookup("");
 }
}

 ds.getConnection();


But the, connection is not being established. No Exception/error is being 
thrown in console.

Context xml has been double checked to see all the datasource properties and 
attributes are set proper.

Any reason, i'm not able to establish database connection?

P.S: Normal IMPORT process is running unaffected i.e Data is being indexed into 
solr with the same datasource configuration in context xml.


Thanks and Regards,
Srinivas Kashyap
Senior Software Engineer
"GURUDAS HERITAGE"
'Block A' , No 59/2, 2nd Floor, 100 Feet Ring Road,
Kadirenahalli, Padmanabhanagar
Banashankari 2nd Stage,
Bangalore-560070
P:  973-986-6105
Bamboo Rose
The only B2B marketplace powered by proven trade engines.
www.BambooRose.com<http://www.bamboorose.com/>

Make Retail. Fun. Connected. Easier. Smarter. Together. Better.


DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Out of memory error during full import

2016-02-04 Thread Srinivas Kashyap
Hello,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. When i try to do full import, i'm getting 
OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the 
maximum extent possible. Is there a workaround to do initial data load without 
running into this error?

I found that 'batchSize=-1' parameter needs to be specified in the datasource 
for MySql, is there a way to specify for others Databases as well?

Thanks and Regards,
Srinivas Kashyap
DISCLAIMER: E-mails and attachments from Bamboo Rose, Inc. are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Out of memory error during full import

2016-02-03 Thread Srinivas Kashyap
Hello,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. When i try to do full import, i'm getting 
OutOfMemory error(Java Heap Space). I increased the HEAP allocation to the 
maximum extent possible. Is there a workaround to do initial data load without 
running into this error?

I found that 'batchSize=-1' parameter needs to be specified in the datasource 
for MySql, is there a way to specify for others Databases as well?

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.