Re: SOLR EofException help

2019-06-13 Thread ennio
Thanks for the information. I will check my server timeout to see what is
happening. That was very helpful.

Also thanks for pointing out the swap space memory allocation I will double
check here.

 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Markus Jelsma
Hello,

It has something to do with the skewed facet counts seen in another thread. To 
make a full comparison i indexed the same set to a fresh 7.7 build. Without my 
DocValues error, there is still a reasonable difference:

7.7 shard 1: 7.8 GB
7.7 shard 2: 7.3 GB

8.1 shard 1: 8.3 GB
8.1 shard 2: 5.9 GB

Strange enough, one is larger and the second a lot smaller, and overall 8.1 
takes about 1 GB less.

So it was my DocValues error that caused 8.1 locally to be larger than the old 
7.7 production.

My bad, again!

Many thanks,
Markus 
 
-Original message-
> From:Shawn Heisey 
> Sent: Thursday 13th June 2019 13:42
> To: solr-user@lucene.apache.org
> Subject: Re: Increased disk space usage 8.1.1 vs 7.7.1
> 
> On 6/13/2019 4:19 AM, Markus Jelsma wrote:
> > We are upgrading to Solr 8. One of our reindexed collections takes a GB 
> > more than the production uses which is on 7.7.1. Production also has 
> > deleted documents. This means Solr 8 somehow uses more disk space. I have 
> > checked both Solr and Lucene's CHANGES but no ticket was immediately 
> > obvious.
> 
> Did you index to a core with nothing in it, or reindex on an existing 
> index without deleting everything first and letting Lucene erase all the 
> segments?
> 
> If you reindexed into an existing index, you could simply have deleted 
> documents taking up the extra space.  Full comparison would need to be 
> done after optimizing both indexes to clear out deleted documents.
> 
> You're probably already aware that optimizing in production is 
> discouraged, unless you're willing to do it frequently ... which gets 
> expensive with large indexes.
> 
> If the size is 1GB larger after both indexes are optimized to clear 
> deleted documents, then the other replies you've gotten will be important.
> 
> Thanks,
> Shawn
> 


RE: Different facet count between 7.7.1 and 8.1.1

2019-06-13 Thread Markus Jelsma
Hello Jan,

We traced it back to not reindexing 'everything' when we enabled docValues for 
the field i facetted on. Most records before the change do not show up if i 
query old data, and it was only partially reindexed.

My bad!

Thanks,
Markus
 
-Original message-
> From:Jan Høydahl 
> Sent: Thursday 13th June 2019 0:17
> To: solr-user 
> Subject: Re: Different facet count between 7.7.1 and 8.1.1
> 
> Can you reproduce it from a clean 7.7.1 install? I mean, index N docs and 
> then run the facet query? Is it a distributed query or a single shard? Does 
> an "optimize" change anything? Is this DocValues strings?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> > 12. jun. 2019 kl. 23:49 skrev Markus Jelsma :
> > 
> > Hello again,
> > 
> > We found another oddity when upgrading to Solr 8. For a *:* query, the 
> > facet counts for a simple string field do not match at all between these 
> > versions. Solr 7.7.1 gives less or zero counts where as for 8 we see the 
> > correct counts. So something seems fixed for a bug that i was not aware of, 
> > although are unit tests rely heavily on correct facet counts.
> > 
> > When i do a field query : the numFound matches the 
> > correct facet counts i see on Solr 8.
> > 
> > I checked CHANGES.txt for anything on this subject, but the issues do not 
> > seem to match this description. Does anyone have an idea what difference in 
> > behaviour i see, and what ticket dealt with this subject?
> > 
> > We do not use JSON-facets here.
> > 
> > Many thanks,
> > Markus
> 
> 


Re: SOLR EofException help

2019-06-13 Thread Shawn Heisey

On 6/13/2019 7:30 AM, ennio wrote:

The server for most part runs fine, but when I look at the logs I see from
time to time the following error.

org.eclipse.jetty.io.EofException: Closed


Jetty's EofException is nearly always caused by a specific event:

The client talking to Solr closed the TCP/HTTP connection before Solr 
was done processing the request.  When Solr finally finished the request 
and tried to respond, Jetty found that it could not send the response, 
because the TCP connection was gone.


You'll need to adjust the timeouts on your client software so that it 
allows Solr more time to respond and doesn't close the connection too 
quickly.


Side note:  Java says your server is using 5GB of swap.  If that's an 
accurate value, it's usually an indication that the software on the 
system is allocating a lot more memory than the server has.  It also 
says that the machine is only using 3GB out of the 8GB available, so the 
over-allocation must be non-persistent... and is probably 
periodic/scheduled.


With an index as small as you have, 2GB of heap is probably more than 
you need.  You could likely reduce that to 1GB, maybe even less. 
Knowing for sure will require experimentation.


Thanks,
Shawn


SOLR EofException help

2019-06-13 Thread ennio
I have SOLR 7.7.1 running on a Windows Server 2016 with 8GB and 2 Cores
(Virtual). The machine is dedicated to the SOLR server, so no other process
is running on it.

My collection is small only 110.86MB and 15,500 documents on it. 

 

The server for most part runs fine, but when I look at the logs I see from
time to time the following error. 



org.eclipse.jetty.io.EofException: Closed
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:665)
~[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:126)
~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at java.io.OutputStream.write(Unknown Source) ~[?:1.8.0_211]
at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) ~[?:1.8.0_211]
at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) ~[?:1.8.0_211]
at sun.nio.cs.StreamEncoder.write(Unknown Source) ~[?:1.8.0_211]
at java.io.OutputStreamWriter.write(Unknown Source) ~[?:1.8.0_211]
at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140)
~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 -
ishan - 2019-02-23 02:39:09]
at 
org.apache.solr.common.util.FastWriter.flushBuffer(FastWriter.java:154)
~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 -
ishan - 2019-02-23 02:39:09]
at
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:82)
~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:68)
~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:788)
[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:525)
[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan
- 2019-02-23 02:39:07]
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
[jetty-servlet-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
[jetty-servlet-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
[jetty-security-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
[jetty-servlet-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114]
at

Re: SOLR JOIN

2019-06-13 Thread Paresh
Hi Erick,

I am able to achieve querying on Collection3 with INNER JOIN between two
document types and JOIN across collection1 using below mechanism. I am also
getting facetting information from collection3 along with data.

http://localhost:8983/solr/collection3/tcfts?wt=json=on=0=50=AND=OC2:(9350)
AND _query_:"{!join from=col3_Oc1 to=Oc1}ID1:xtWNf_fTAaLUgD" AND
_query_:{!join to=col3_Field1 from=Field1
fromIndex=collection1}Field2:12010340

Now, I want to collect the faceting information for collection1 with the
same type of query but no data.
So what is needed is -
Query on collection1, match few columns with some expression and then do
JOIN across collection3 do INNER JOIN as above.

Could you help on this?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SOLR JOIN

2019-06-13 Thread Paresh
Hi Erick,

I am able to achieve querying on Collection3 with INNER JOIN between two
document types and JOIN across collection1 using below mechanism. I am also
getting facetting information from collection3 along with data.

http://localhost:8983/solr/collection3/tcfts?wt=json=on=0=50=AND=OC2:(9350)
AND _query_:"{!join from=col3_Oc1 to=Oc1}ID1:xtWNf_fTAaLUgD" AND
_query_:{!join to=col3_Field1 from=Field1
fromIndex=collection1}Field2:12010340

Now, I want to collect the faceting information for collection1 with the
same type of query but no data.
So what is needed is -
Query on collection1, match few columns with some expression and then do
JOIN across collection3 do INNER JOIN as above.

Could you help on this?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Shawn Heisey

On 6/13/2019 4:19 AM, Markus Jelsma wrote:

We are upgrading to Solr 8. One of our reindexed collections takes a GB more 
than the production uses which is on 7.7.1. Production also has deleted 
documents. This means Solr 8 somehow uses more disk space. I have checked both 
Solr and Lucene's CHANGES but no ticket was immediately obvious.


Did you index to a core with nothing in it, or reindex on an existing 
index without deleting everything first and letting Lucene erase all the 
segments?


If you reindexed into an existing index, you could simply have deleted 
documents taking up the extra space.  Full comparison would need to be 
done after optimizing both indexes to clear out deleted documents.


You're probably already aware that optimizing in production is 
discouraged, unless you're willing to do it frequently ... which gets 
expensive with large indexes.


If the size is 1GB larger after both indexes are optimized to clear 
deleted documents, then the other replies you've gotten will be important.


Thanks,
Shawn


Re: Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Colvin Cowie
Hello,

For context it would probably be helpful to know some more info about the
collection. e.g. it's 1GB bigger, but what percentage increase does that
represent? Like is it 0.5% or 50%?

On Thu, 13 Jun 2019 at 11:19, Markus Jelsma 
wrote:

> Hello,
>
> We are upgrading to Solr 8. One of our reindexed collections takes a GB
> more than the production uses which is on 7.7.1. Production also has
> deleted documents. This means Solr 8 somehow uses more disk space. I have
> checked both Solr and Lucene's CHANGES but no ticket was immediately
> obvious.
>
> Does anyone know what is going on?
>
> Many thanks,
> Markus
>


Re: SolrCloud indexing triggers merges and timeouts

2019-06-13 Thread Shawn Heisey

On 6/6/2019 9:00 AM, Rahul Goswami wrote:

*OP Reply* : Total 48 GB per node... I couldn't see another software using
a lot of memory.
I am honestly not sure about the reason for change of directory factory to
SimpleFSDirectoryFactory. But I was told that with mmap at one point we
started to see the shared memory usage on Windows go up significantly,
intermittently freezing the system.
Could the choice of DirectoryFactory here be a factor for the long
updates/frequent merges?


With about 24GB of RAM to cache 1.4TB of index data, you're never going 
to have good performance.  Any query you do is probably going to read 
more than 24GB of data from the index, which means that it cannot come 
from memory, some of it must come from disk, which is incredibly slow 
compared to memory.


MMap is more efficient than "simple" filesystem access.  I do not know 
if you would see markedly better performance, but getting rid of the 
DirectoryFactory config and letting Solr choose its default might help.



How many total documents (maxDoc, not numDoc) are in that 1.4 TB of
space?
*OP Reply:* Also, there are nearly 12.8 million total docs (maxDoc, NOT
numDoc) in that 1.4 TB space


Unless you're doing faceting or grouping on fields with extremely high 
cardinality, which I find to be rarely useful except for data mining, 
24GB of heap for 12.8 million docs seems very excessive.  I was 
expecting this number to be something like 500 million or more ... that 
small document count must mean each document is HUGE.  Can you take 
steps to reduce the index size, perhaps by setting stored, indexed, 
and/or docValues to "false" on some of your fields, and having your 
application go to the system of record for full details on each 
document?  You will have to reindex after making changes like that.



Can you share the GC log that Solr writes?

*OP Reply:*  Please find the GC logs and thread dumps at this location
https://drive.google.com/open?id=1slsYkAcsH7OH-7Pma91k6t5T72-tIPlw


The larger GC log was unrecognized by both gcviwer and gceasy.io ... the 
smaller log shows heap usage about 10GB, but it only covers 10 minutes, 
so it's not really conclusive for diagnosis.  The first thing I can 
suggest to try is to reduce the heap size to 12GB ... but I do not know 
if that's actually going to work.  Indexing might require more memory. 
The idea here is to make more memory available to the OS disk cache ... 
with your index size, you're probably going to need to add memory to the 
system (not the heap).



Another observation is that the CPU usage reaches around 70% (through
manual monitoring) when the indexing starts and the merges are observed. It
is well below 50% otherwise.


Indexing will increase load, and that increase is often very 
significant.  Adding memory to the system is your best bet for better 
performance.  I'd want 1TB of memory for a 1.4TB index ... but I know 
that memory sizes that high are extremely expensive, and for most 
servers, not even possible.  512GB or 256GB is more attainable, and 
would have better performance than 48GB.



Also, should something be altered with the mergeScheduler setting ?
"mergeScheduler":{
 "class":"org.apache.lucene.index.ConcurrentMergeScheduler",
 "maxMergeCount":2,
 "maxThreadCount":2},


Do not configure maxThreadCount beyond 1 unless your data is on SSD.  It 
will slow things down a lot due to the fact that standard disks must 
move the disk head to read/write from different locations, and head 
moves take time.  SSD can do I/O from any location without pauses, so 
more threads would probably help performance rather than hurt it.


Increase maxMergeCount to 6 -- at 2, large merges will probably stop 
indexing entirely.  With a larger number, Solr can keep indexing even 
when there's a huge segment merge happening.


Thanks,
Shawn


Re: Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Alexandre Rafalovitch
If you look at the data files, is any extension suddenly taking way more
space? That may give a clue.

Also is schema the same? Like you did not enable docvalues on strings by
default or similar.

Regards,
Alex

On Thu, Jun 13, 2019, 6:19 AM Markus Jelsma, 
wrote:

> Hello,
>
> We are upgrading to Solr 8. One of our reindexed collections takes a GB
> more than the production uses which is on 7.7.1. Production also has
> deleted documents. This means Solr 8 somehow uses more disk space. I have
> checked both Solr and Lucene's CHANGES but no ticket was immediately
> obvious.
>
> Does anyone know what is going on?
>
> Many thanks,
> Markus
>


Re: Is it possible configure a single data-config.xml file for all the environments?

2019-06-13 Thread Shawn Heisey

On 6/12/2019 7:46 PM, Hugo Angel Rodriguez wrote:

Thanks Shawn for your answers

Regarding your question: " Are these environments on separate Solr instances, 
separate servers, or are they on the same Solr instance?"
My answers is: These environments are on separate solr instances, separate 
servers

Are we dealing with SolrCloud (which is Solr + ZooKeeper), or standalone Solr 
instances?
We are dealing with standalone solr instances


I think JNDI is probably your best bet to have the same DIH config 
everywhere.  The only documentation I can find about the JNDI method is 
on the Solr wiki.  If it's still valid, we should get it in the ref guide.


https://wiki.apache.org/solr/DataImportHandlerFaq#How_do_I_use_a_JNDI_DataSource.3F

You define the JNDI datasource in Jetty, probably in jetty.xml, 
according to Jetty documentation ... and then reference the datasource 
name in the DIH config.


Here's an eclipse wiki page about setting up a JNDI datasource.  I hope 
it's enough -- Jetty is not something I'm super familiar with:


https://wiki.eclipse.org/Jetty/Howto/Configure_JNDI_Datasource

I checked Solr's master codebase, and jndiName is still in the source 
code, so I think it should still work.


Thanks,
Shawn


Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Markus Jelsma
Hello,

We are upgrading to Solr 8. One of our reindexed collections takes a GB more 
than the production uses which is on 7.7.1. Production also has deleted 
documents. This means Solr 8 somehow uses more disk space. I have checked both 
Solr and Lucene's CHANGES but no ticket was immediately obvious.

Does anyone know what is going on?

Many thanks,
Markus