Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-03 Thread Walter Underwood
We run a big cluster with 8 GB heap on the JVMs. When we used CMS, I gave 2 GB 
to
the new generation. Solr queries make a ton of short-lived allocations. You 
want all of that
to come from the new gen. I don’t fool around with ratios. I just set the 
numbers.

We used these:

-d64
-server
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+ExplicitGCInvokesConcurrent
-Xms8g
-Xmx8g
-XX:NewSize=2g
-XX:MaxPermSize=256m

Now we run G1.

This is a cluster with 25 million documents, 8 shards, 48 nodes, each node has 
36 CPUs.
Queries average 25 terms, which uses a lot of CPU.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 3, 2018, at 6:56 PM, Jeff Courtade  wrote:
> 
> We use 4.3.0 I found that we went into gc hell as you describe with small
> newgen. We use  CMS gc as well
> 
> Using newration=2 got us out of that 3 wasn't enough...heap of 32 gig
> only
> I have not gone over 32 gig as testing showed diminishing returns over 32
> gig. I only was brave enough to go to 40 though.
> 
> On Wed, Oct 3, 2018, 5:34 PM Shawn Heisey  wrote:
> 
>> On 10/3/2018 8:01 AM, yasoobhaider wrote:
>>> Master and slave config:
>>> ram: 120GB
>>> cores: 16
>>> 
>>> At any point there are between 10-20 slaves in the cluster, each serving
>> ~2k
>>> requests per minute. Each slave houses two collections of approx 10G
>>> (~2.5mil docs) and 2G(10mil docs) when optimized.
>>> 
>>> I am working with Solr 6.2.1
>>> 
>>> Solr configuration:
>> 
>>> -Xmn10G
>>> -Xms80G
>>> -Xmx80G
>> 
>> I cannot imagine that an 80GB heap is needed when there are only 12.5
>> million documents and 12GB of index data.  I've handled MUCH larger
>> indexes with only 8GB of heap.  Even with your very high query rate, if
>> you really do need 80GB of heap, there's something unusual going on.
>> 
>>> I would really be grateful for any advice on the following:
>>> 
>>> 1. What could be the reason behind CMS not being able to free up the
>> memory?
>>> What are some experiments I can run to solve this problem?
>> 
>> Maybe there's no garbage in the heap to free up?  If the GC never
>> finishes, that sounds like a possible problem with either Java or the
>> operating system, maybe even some kind of hardware issue.
>> 
>>> 2. Can stopping/starting indexing be a reason for such drastic changes
>> to GC
>>> pattern?
>> 
>> Indexing generally requires more heap than just handling queries.
>> 
>>> 3. I have read at multiple places on this mailing list that the heap size
>>> should be much lower (2x-3x the size of collection), but the last time I
>>> tried CMS was not able to run smoothly and GC STW would occur which was
>> only
>>> solved by a restart. My reasoning for this is that the type of queries
>> and
>>> the throughput are also a factor in deciding the heap size, so it may be
>>> that our queries are creating too many objects maybe. Is my reasoning
>>> correct or should I try with a lower heap size (if it helps achieve a
>> stable
>>> gc pattern)?
>> 
>> Do you have a GC log covering a good long runtime, where the problems
>> happened during the time the log covers?  Can you share it?  Attachments
>> rarely make it to the list, you'll need to find a file sharing site.
>> The small excerpt from the GC log that you included in your message
>> isn't enough to make any kind of determination.  Full disclosure:  I'm
>> going to send your log to http://gceasy.io for analysis.  You can do
>> this yourself, their analysis is really good.
>> 
>> There is no generic advice possible regarding how large a heap you
>> need.  It will depend on many factors.
>> 
>>> (4. Silly question, but what is the right way to ask question on the
>> mailing
>>> list? via mail or via the nabble website? I sent this question earlier
>> as a
>>> mail, but it was not showing up on the nabble website so I am posting it
>>> from the website now)
>> 
>> Nabble mirrors the mailing list in forum format.  It's generally better
>> to use the mailing list directly.  The project has absolutely no
>> influence over the Nabble website, and things do not always work
>> correctly when Nabble is involved.  The IRC channel is another good way
>> to get support.  If there is somebody paying attention when you ask your
>> question, a far more interactive chat can be obtained.
>> 
>> Thanks,
>> Shawn
>> 
>> 



Re: Metrics API via Solrj

2018-10-03 Thread deniz
Thanks a lot Jason and Shawn, it is quite smooth although there is no built
in stuff like collection or schema request objects for metrics :) 



-
Zeki ama calismiyor... Calissa yapar...
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-03 Thread Jeff Courtade
We use 4.3.0 I found that we went into gc hell as you describe with small
newgen. We use  CMS gc as well

Using newration=2 got us out of that 3 wasn't enough...heap of 32 gig
only
 I have not gone over 32 gig as testing showed diminishing returns over 32
gig. I only was brave enough to go to 40 though.

On Wed, Oct 3, 2018, 5:34 PM Shawn Heisey  wrote:

> On 10/3/2018 8:01 AM, yasoobhaider wrote:
> > Master and slave config:
> > ram: 120GB
> > cores: 16
> >
> > At any point there are between 10-20 slaves in the cluster, each serving
> ~2k
> > requests per minute. Each slave houses two collections of approx 10G
> > (~2.5mil docs) and 2G(10mil docs) when optimized.
> >
> > I am working with Solr 6.2.1
> >
> > Solr configuration:
> 
> > -Xmn10G
> > -Xms80G
> > -Xmx80G
>
> I cannot imagine that an 80GB heap is needed when there are only 12.5
> million documents and 12GB of index data.  I've handled MUCH larger
> indexes with only 8GB of heap.  Even with your very high query rate, if
> you really do need 80GB of heap, there's something unusual going on.
>
> > I would really be grateful for any advice on the following:
> >
> > 1. What could be the reason behind CMS not being able to free up the
> memory?
> > What are some experiments I can run to solve this problem?
>
> Maybe there's no garbage in the heap to free up?  If the GC never
> finishes, that sounds like a possible problem with either Java or the
> operating system, maybe even some kind of hardware issue.
>
> > 2. Can stopping/starting indexing be a reason for such drastic changes
> to GC
> > pattern?
>
> Indexing generally requires more heap than just handling queries.
>
> > 3. I have read at multiple places on this mailing list that the heap size
> > should be much lower (2x-3x the size of collection), but the last time I
> > tried CMS was not able to run smoothly and GC STW would occur which was
> only
> > solved by a restart. My reasoning for this is that the type of queries
> and
> > the throughput are also a factor in deciding the heap size, so it may be
> > that our queries are creating too many objects maybe. Is my reasoning
> > correct or should I try with a lower heap size (if it helps achieve a
> stable
> > gc pattern)?
>
> Do you have a GC log covering a good long runtime, where the problems
> happened during the time the log covers?  Can you share it?  Attachments
> rarely make it to the list, you'll need to find a file sharing site.
> The small excerpt from the GC log that you included in your message
> isn't enough to make any kind of determination.  Full disclosure:  I'm
> going to send your log to http://gceasy.io for analysis.  You can do
> this yourself, their analysis is really good.
>
> There is no generic advice possible regarding how large a heap you
> need.  It will depend on many factors.
>
> > (4. Silly question, but what is the right way to ask question on the
> mailing
> > list? via mail or via the nabble website? I sent this question earlier
> as a
> > mail, but it was not showing up on the nabble website so I am posting it
> > from the website now)
>
> Nabble mirrors the mailing list in forum format.  It's generally better
> to use the mailing list directly.  The project has absolutely no
> influence over the Nabble website, and things do not always work
> correctly when Nabble is involved.  The IRC channel is another good way
> to get support.  If there is somebody paying attention when you ask your
> question, a far more interactive chat can be obtained.
>
> Thanks,
> Shawn
>
>


Re: Migrate cores from 4.10.2 to 7.5.0

2018-10-03 Thread Jan Høydahl
See SOLR-12281  for details 
around the new migration limitation that will be introduced from v8.0. You may 
be able to get yourself from 4.x to 7.x through clever hoops, but then if you 
want to upgrade to 8.x later, there is no other way than full re-index. So 
better plan for a full re-index from the beginning.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 4. okt. 2018 kl. 00:23 skrev Shawn Heisey :
> 
> On 10/3/2018 3:17 PM, Pure Host - Wolfgang Freudenberger wrote:
>> Is there any way to migrate cores from 4.10.2 to 7.5.0? I guess not, but 
>> perhaps someone has an idea. ^^
> 
> In a word, no.  A specific major version of Solr is only guaranteed to read 
> indexes built and managed *completely* by the previous major version or older 
> releases in the same major version.  But even if you have a supported upgrade 
> path, my advice is to always build indexes from scratch when upgrading.
> 
> It was recently brought to my attention that upgrades through two major 
> Lucene versions are NOT guaranteed to work, even if the index is upgraded 
> through all the major versions one by one with Lucene's IndexUpgrader utility 
> (or by running an optimize operation within Solr).  I was surprised by this, 
> but the person who said it is someone whose word I have no reason to doubt.
> 
> Thanks,
> Shawn
> 



Re: Migrate cores from 4.10.2 to 7.5.0

2018-10-03 Thread Shawn Heisey

On 10/3/2018 3:17 PM, Pure Host - Wolfgang Freudenberger wrote:
Is there any way to migrate cores from 4.10.2 to 7.5.0? I guess not, 
but perhaps someone has an idea. ^^


In a word, no.  A specific major version of Solr is only guaranteed to 
read indexes built and managed *completely* by the previous major 
version or older releases in the same major version.  But even if you 
have a supported upgrade path, my advice is to always build indexes from 
scratch when upgrading.


It was recently brought to my attention that upgrades through two major 
Lucene versions are NOT guaranteed to work, even if the index is 
upgraded through all the major versions one by one with Lucene's 
IndexUpgrader utility (or by running an optimize operation within 
Solr).  I was surprised by this, but the person who said it is someone 
whose word I have no reason to doubt.


Thanks,
Shawn



Re: Nutch+Solr

2018-10-03 Thread Terry Steichen
Bineesh,

I don't use Nutch, so don't know if this is relevant, but I've had
similar-sounding failures in doing and restoring backups.  The solution
for me was to deactivate authentication while the backup was being done,
and then activate it again afterwards.  Then everything was restored
correctly.  Otherwise, I got a whole bunch of efforts (if I left
authentication active when doing the backup). 

Terry


On 10/03/2018 10:21 AM, Bineesh wrote:
> Hello,
>
> We use Solr 7.3.1 and Nutch 1.15
>
> We've placed the authentication for our solr cloud setup using the basic
> auth plugin ( login details -> solr/SolrRocks)
>
> For the nutch to index data to solr, below properties added to nutch-sitexml
> file
>
>  
>   solr.auth
>   true
>   
>   Whether to enable HTTP basic authentication for communicating with Solr.
>   Use the solr.auth.username and solr.auth.password properties to configure
>   your credentials.
>   
> 
>
>
> 
>   solr.auth.username
>   solr
>   
>   Username
>   
> 
>
>
> 
>   solr.auth.password
>   SolrRocks
>   
>   Password
>   
> 
>
> While Nutch index data to solr, its failing due to authentication. Am i
> doing something wrong ? Pls help
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



Re: Migrate cores from 4.10.2 to 7.5.0

2018-10-03 Thread Emir Arnautović
Hi Wolfgang,
I would say that your safest bet is to start from 7.5 schema, adjust it to 
suite your needs and reindex (better than to try adjust your existing schema to 
7.5). If all your fields are stored in current collection, you might be able to 
use DIH to reindex: http://www.od-bits.com/2018/07/reindexing-solr-core.html 


I’ve recently used this approach for 4.x to 6.x.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 3 Oct 2018, at 23:17, Pure Host - Wolfgang Freudenberger 
>  wrote:
> 
> Hi guys,
> 
> Is there any way to migrate cores from 4.10.2 to 7.5.0? I guess not, but 
> perhaps someone has an idea. ^^
> 
> -- 
> Mit freundlichem Gruß / kind regards
> 
> Wolfgang Freudenberger
> Pure Host IT-Services
> Münsterstr. 14
> 48341 Altenberge
> GERMANY
> Tel.: (+49) 25 71 - 99 20 170
> Fax: (+49) 25 71 - 99 20 171
> 
> Umsatzsteuer ID DE259181123
> 
> Informieren Sie sich über unser gesamtes Leistungsspektrum unter 
> www.pure-host.de
> Get our whole services at www.pure-host.de
> 
> 



Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-03 Thread Shawn Heisey

On 10/3/2018 8:01 AM, yasoobhaider wrote:

Master and slave config:
ram: 120GB
cores: 16

At any point there are between 10-20 slaves in the cluster, each serving ~2k
requests per minute. Each slave houses two collections of approx 10G
(~2.5mil docs) and 2G(10mil docs) when optimized.

I am working with Solr 6.2.1

Solr configuration:



-Xmn10G
-Xms80G
-Xmx80G


I cannot imagine that an 80GB heap is needed when there are only 12.5 
million documents and 12GB of index data.  I've handled MUCH larger 
indexes with only 8GB of heap.  Even with your very high query rate, if 
you really do need 80GB of heap, there's something unusual going on.



I would really be grateful for any advice on the following:

1. What could be the reason behind CMS not being able to free up the memory?
What are some experiments I can run to solve this problem?


Maybe there's no garbage in the heap to free up?  If the GC never 
finishes, that sounds like a possible problem with either Java or the 
operating system, maybe even some kind of hardware issue.



2. Can stopping/starting indexing be a reason for such drastic changes to GC
pattern?


Indexing generally requires more heap than just handling queries.


3. I have read at multiple places on this mailing list that the heap size
should be much lower (2x-3x the size of collection), but the last time I
tried CMS was not able to run smoothly and GC STW would occur which was only
solved by a restart. My reasoning for this is that the type of queries and
the throughput are also a factor in deciding the heap size, so it may be
that our queries are creating too many objects maybe. Is my reasoning
correct or should I try with a lower heap size (if it helps achieve a stable
gc pattern)?


Do you have a GC log covering a good long runtime, where the problems 
happened during the time the log covers?  Can you share it?  Attachments 
rarely make it to the list, you'll need to find a file sharing site.  
The small excerpt from the GC log that you included in your message 
isn't enough to make any kind of determination.  Full disclosure:  I'm 
going to send your log to http://gceasy.io for analysis.  You can do 
this yourself, their analysis is really good.


There is no generic advice possible regarding how large a heap you 
need.  It will depend on many factors.



(4. Silly question, but what is the right way to ask question on the mailing
list? via mail or via the nabble website? I sent this question earlier as a
mail, but it was not showing up on the nabble website so I am posting it
from the website now)


Nabble mirrors the mailing list in forum format.  It's generally better 
to use the mailing list directly.  The project has absolutely no 
influence over the Nabble website, and things do not always work 
correctly when Nabble is involved.  The IRC channel is another good way 
to get support.  If there is somebody paying attention when you ask your 
question, a far more interactive chat can be obtained.


Thanks,
Shawn



Re: Solr Cloud in recovering state & down state for long

2018-10-03 Thread Ganesh Sethuraman
On Tue, Oct 2, 2018 at 11:46 PM Shawn Heisey  wrote:

> On 10/2/2018 8:55 PM, Ganesh Sethuraman wrote:
> > We are using 2 node SolrCloud 7.2.1 cluster with external 3 node ZK
> > ensemble in AWS. There are about 60 collections at any point in time. We
> > have per JVM max heap of 8GB.
>
> Let's focus for right now on a single Solr machine, rather than the
> whole cluster.  How many shard replicas (cores) are on one server?  How
> much disk space does all the index data take? How many documents
> (maxDoc, which includes deleted docs) are in all those cores?  What is
> the total amount of RAM on the server? Is there any other software
> besides Solr running on each server?
>
> We have  471 replicas are available in each server we have about 60
collections each with 8 shards and 2 replica. Couple of them just 2 shards
they are small size. Note that only about 30 of them are actively used. Old
collections are periodically deleted.
470 GB of index data per node
Max Doc per collection is about 300M. However average per collection will
be about 50M Docs.
256GB RAM (24 vCPUs) on each of the two AWS
No other software running on the box

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

>
> > But as stated above problem, we will have few collection replicas in the
> > recovering and down state. In the past we have seen it come back to
> normal
> > by restarting the solr server, but we want to understand is there any way
> > to get this back to normal (all synched up with Zookeeper) through
> command
> > line/admin? Another question is, being in this state can it cause data
> > issue? How do we check that (distrib=false on collection count?)?
>
> As long as you have at least one replica operational on every shard, you
> should be OK.  But if you only have one replica operational, then you're
> in a precarious state, where one additional problem could result in
> something being unavailable.
>
> thanks for info.

> If all is well, SolrCloud should not have replicas stay in down or
> recovering state for very long, unless they're really large, in which
> case it can take a while to copy the data from the leader.  If that
> state persists for a long time, there's probably something going wrong
> with your Solr install.  Usually restarting Solr is the only way to
> recover persistently down replicas.  If it happens again after restart,
> then the root problem has not been dealt with, and you're going to need
> to figure it out.
>
> Ok. Based on the point above it looks restarting the only option, no other
way to sync with ZK.  Thanks for that

The log snippet you shared only covers a timespan of less than one
> second, so it's not very helpful in making any kind of determination.
> The "session expired" message sounds like what happens when the
> zkClientTimeout value is exceeded.  Internally, this value defaults to
> 15 seconds, and typical example configs set it to 30 seconds ... so when
> the session expires, it means there's a SERIOUS problem.  For computer
> software, 15 or 30 seconds is a relative eternity.  A properly running
> system should NEVER exceed that timeout.
>
> I don't think we have a memory issue (GC Log for busy day is posted here),
we had Solr going out of sync with ZK because of the manual ZK Transaction
log parsing/checking on the server (we did that on the Sept 17 16:00 UTC as
you can see in the log), which resulted in ZK timeout. Since then the Solr
has not returned to normal.  Is there a possibility of the Solr query (real
time GET )response time increasing due the solr servers being in
recovering/Down state?

Here is the full Solr Log file (Note that it is in INFO mode):
https://raw.githubusercontent.com/ganeshmailbox/har/master/SolrLogFile
Here is the GC Log:
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMTAvMy8tLTAxX3NvbHJfZ2MubG9nLjUtLTIxLTE5LTU3


Can you share your solr log when the problem happens, covering a
> timespan of at least a few minutes (and ideally much longer), as well as
> a gc log from a time when Solr was up for a long time?  Hopefully the
> solr.log and gc log will cover the same timeframe.  You'll need to use a
> file sharing site for the GC log, since it's likely to be a large file.
> I would suggest compressing it.  If the solr.log is small enough, you
> could use a paste website for that, but if it's large, you'll need to
> use a file sharing site.  Attachments to list email are almost never
> preserved.
>
> Thanks,
> Shawn
>
>


Migrate cores from 4.10.2 to 7.5.0

2018-10-03 Thread Pure Host - Wolfgang Freudenberger

Hi guys,

Is there any way to migrate cores from 4.10.2 to 7.5.0? I guess not, but 
perhaps someone has an idea. ^^


--
Mit freundlichem Gruß / kind regards

Wolfgang Freudenberger
Pure Host IT-Services
Münsterstr. 14
48341 Altenberge
GERMANY
Tel.: (+49) 25 71 - 99 20 170
Fax: (+49) 25 71 - 99 20 171

Umsatzsteuer ID DE259181123

Informieren Sie sich über unser gesamtes Leistungsspektrum unter 
www.pure-host.de
Get our whole services at www.pure-host.de




Nutch+Solr

2018-10-03 Thread Bineesh
Hello,

We use Solr 7.3.1 and Nutch 1.15

We've placed the authentication for our solr cloud setup using the basic
auth plugin ( login details -> solr/SolrRocks)

For the nutch to index data to solr, below properties added to nutch-sitexml
file

 
  solr.auth
  true
  
  Whether to enable HTTP basic authentication for communicating with Solr.
  Use the solr.auth.username and solr.auth.password properties to configure
  your credentials.
  




  solr.auth.username
  solr
  
  Username
  




  solr.auth.password
  SolrRocks
  
  Password
  


While Nutch index data to solr, its failing due to authentication. Am i
doing something wrong ? Pls help



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

2018-10-03 Thread yasoobhaider
Hi

I'm working with a Solr cluster with master-slave architecture.

Master and slave config:
ram: 120GB
cores: 16

At any point there are between 10-20 slaves in the cluster, each serving ~2k
requests per minute. Each slave houses two collections of approx 10G
(~2.5mil docs) and 2G(10mil docs) when optimized.

I am working with Solr 6.2.1

Solr configuration:

-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:MaxTenuringThreshold=8
-XX:ParallelGCThreads=4
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=15
-XX:TargetSurvivorRatio=90
-Xmn10G
-Xms80G
-Xmx80G

Some of these configurations have been reached by multiple trial and errors
over time, including the huge heap size.

This cluster usually runs without any error.

In the usual scenario, old gen gc is triggered according to the
configuration at 50% old gen occupancy, and the collector clears out the
memory over the next minute or so. This happens every 10-15 minutes.

However, I have noticed that sometimes the GC pattern of the slaves
completely changes and old gen gc is not able to clear the memory.

After observing the gc logs closely for multiple old gen gc collections, I
noticed that the old gen gc is triggered at 50% occupancy, but if there is a
GC Allocation Failure before the collection completes (after CMS Initial
Remark but before CMS reset), the old gen collection is not able to clear
much memory. And as soon as this collection completes, another old gen gc is
triggered.

And in worst case scenarios, this cycle of old gen gc triggering, GC
allocation failure keeps happening, and the old gen memory keeps increasing,
leading to a single threaded STW GC, which is not able to do much, and I
have to restart the solr server.

The last time this happened after the following sequence of events:

1. We optimized the bigger collection bringing it to its optimized size of
~10G.
2. For an unrelated reason, we had stopped indexing to the master. We
usually index at a low-ish throughput of ~1mil docs/day. This is relevant as
when we are indexing, the size of the collection increases, and this effects
the heap size used by collection.
3. The slaves started behaving erratically, with old gc collection not being
able to free up the required memory and finally being stuck in a STW GC.

As unlikely as this sounds, this is the only thing that changed on the
cluster. There was no change in query throughput or type of queries.

I restarted the slaves multiple times but the gc behaved in the same way for
over three days. Then when we fixed the indexing and made it live, the
slaves resumed their original gc pattern and are running without any issues
for over 24 hours now.

I would really be grateful for any advice on the following:

1. What could be the reason behind CMS not being able to free up the memory?
What are some experiments I can run to solve this problem?
2. Can stopping/starting indexing be a reason for such drastic changes to GC
pattern?
3. I have read at multiple places on this mailing list that the heap size
should be much lower (2x-3x the size of collection), but the last time I
tried CMS was not able to run smoothly and GC STW would occur which was only
solved by a restart. My reasoning for this is that the type of queries and
the throughput are also a factor in deciding the heap size, so it may be
that our queries are creating too many objects maybe. Is my reasoning
correct or should I try with a lower heap size (if it helps achieve a stable
gc pattern)?

(4. Silly question, but what is the right way to ask question on the mailing
list? via mail or via the nabble website? I sent this question earlier as a
mail, but it was not showing up on the nabble website so I am posting it
from the website now)

-
-

Logs which show this:


Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  437184344 bytes,  437184344 total
- age   2:  194385736 bytes,  631570080 total
: 9868992K->616768K(9868992K), 1.7115191 secs]
48349347K->40160469K(83269312K), 1.7116410 secs] [Times: user=6.25 sys=0.00,
real=1.71 secs]
Heap after GC invocations=921 (full 170):
 par new generation   total 9868992K, used 616768K [0x7f4f8400,
0x7f520400, 0x7f520400)
  eden space 9252224K,   0% used [0x7f4f8400, 0x7f4f8400,
0x7f51b8b6)
  from space 616768K, 100% used [0x7f51de5b, 0x7f520400,
0x7f520400)
  to   space 616768K,   0% used [0x7f51b8b6, 0x7f51b8b6,

Re: Restoring and upgrading a standalone index to SolrCloud

2018-10-03 Thread Shawn Heisey

On 10/3/2018 10:45 AM, Shawn Heisey wrote:
Here's one way to do this: 



Oh, and when you delete the data directory, delete the tlog directory 
too.  Don't copy tlog from the non-cloud install.  Solr will re-create 
it as long as the directory gives it permission to do so.


Thanks,
Shawn



Re: Restoring and upgrading a standalone index to SolrCloud

2018-10-03 Thread Shawn Heisey

On 10/3/2018 9:42 AM, Jack Schlederer wrote:

I've successfully upgraded the Lucene 5 index to Lucene 6, and then to Lucene 7,


Upgrading through two major versions is not guaranteed to work.  
Upgrading from an index fully built by major version X-1 is supported, 
but if X-2 or earlier has EVER touched the index, it's probably not 
going to work.  If you find that it does work, great ... but I wouldn't 
recommend trying it.


I recommend always building indexes from scratch when upgrading, even if 
the new version is capable of reading the index created by the old version.



so I think I have an index that can be restored to Solr 7. Do you know if
it's possible to restore an index like this to a SolrCloud environment if I
can get it into a directory that is shared by all the nodes?


Each node needs its own copy of the data, they cannot share an index 
directory.  Lucene works really hard to prevent sharing indexes, and 
this behavior should not be overridden.


In general, yes, you can migrate an index (assuming it's an index that 
will work, note what I said above) from a non-cloud install to a cloud 
install.  That would be greatly complicated if the index were sharded 
already in the non-cloud install -- hopefully your 20GB index is one 
core, not multiple shards.If it's sharded ... build it from scratch, 
because it's not likely that the SolrCloud collection will route data to 
shards in precisely the same way as a non-cloud install.


Here's one way to do this:

* Set up your cloud, create an empty collection with one shard and as 
many replicas as you want.

* Shut down all of the Solr nodes related to that collection.
* Delete the "data" directory under all of the cores related to that 
collection.
* Copy the data directory from the non-cloud install to one of those 
replica cores.

* Start the Solr node where you copied the data.
* Let the system fully stabilize so the replica you have just built 
shows up as green in the Cloud graph.
* Start the other Solr nodes with the other replicas.  They will copy 
the index from the one that got started first.


Thanks,
Shawn



[ANNOUNCE] Luke 7.5.0 released

2018-10-03 Thread Tomoko Uchida
Hi,

Luke 7.5.0 was just released.
For this release, we have two editions of Luke. (Sorry for the confusion.)

* Luke 7.5.0 - Swing edition
Swing edition can be downloaded from here:
https://github.com/DmitryKey/luke/releases/tag/luke-swing-7.5.0

This version of Luke works with JDK 8/9/10/11.
Use of JDK 9+ is highly recommended for HiDPI displays on Windows/Linux
platforms (e.g. Surface Pro.)
See: http://openjdk.java.net/jeps/263

Here is the issue why we're switching to Swing:
https://github.com/DmitryKey/luke/issues/109

This branch provides the very same features the previous release does.
On the other hand, it is currently under active development. So your
feedbacks are most welcome.


* Luke 7.5.0 - JavaFX edition
JavaFX edition (still) can be downloaded from here:
https://github.com/DmitryKey/luke/releases/tag/luke-javafx-7.5.0

This version of Luke works with JDK 8/9/10/11 and JavaFX.
Please look through this guide before installing:

Installation Guide for Luke 7.5.0 JavaFX edition
https://github.com/DmitryKey/luke/wiki/Installation-Guide-for-Luke-7.5.0-JavaFX-edition



Regards,
Tomoko


Restoring and upgrading a standalone index to SolrCloud

2018-10-03 Thread Jack Schlederer
Hello,

We currently run Solr 5.4 as our production search backend. We run it in a
master/slave replication architecture, and we're starting an upgrade to
Solr 7.5 using a SolrCloud architecture.

One of our collections is around 20GB and hosts about 200M documents, and
it would take around 6 hours to do a full dataimport from the database, so
we'd like to upgrade the index and restore it to SolrCloud. I've
successfully upgraded the Lucene 5 index to Lucene 6, and then to Lucene 7,
so I think I have an index that can be restored to Solr 7. Do you know if
it's possible to restore an index like this to a SolrCloud environment if I
can get it into a directory that is shared by all the nodes?

Thanks,
Jack


Re: How to do rollback from solrclient using python

2018-10-03 Thread Emir Arnautović
Hi Chetra,
In addition to what Jason explained, rollbacks do not work in Solr Cloud.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 3 Oct 2018, at 14:45, Jason Gerlowski  wrote:
> 
> Hi Chetra,
> 
> The syntax that you're looking for is 
> "/solr/someCoreName/update?rollback=true".
> 
> But I'm afraid Rollback might not be quite what you think it is.  You
> mentioned: "but it doesn't work, whenever there is a commit the
> request still updates on the server".  Yes, that is the expected
> behavior with rollbacks.  Rollbacks reset your index to the last
> commit point.  If there was a commit right before a rollback, the
> rollback will have no effect.
> 
> One last point is that you should be very careful using rollbacks.
> Rollbacks are going to undo all changes to your index since the last
> commit.  If you have more than one client thread changing documents,
> this can be very dangerous as you will reset a lot of things you
> didn't intend.  Even if you can guarantee that there's only one client
> making changes to your index, and that client is itself
> single-threaded, the result of a rollback is still indeterminate if
> you're using server auto-commit settings.  The client-triggered
> rollback will occasionally race against the server-triggered commit.
> Will your doc changes get rolled back?  They will if the rollback
> happens first, but if the commit happens right before the rollback,
> your rollback won't do anything!  Anyways rollbacks have their place,
> but be very careful when using them!
> 
> Hope that helps,
> 
> Jason
> On Wed, Oct 3, 2018 at 4:41 AM Chetra Tep  wrote:
>> 
>> Hi Solr team,
>> Current I am creating a python application that accesses to solr server.
>> I have to handle updating document and need a rollback function.
>> I want to send a rollback request whenever exception occurs.
>> first I try sth like this from curl command :
>> curl http://localhost:8983/solr/mysolr/update?command=rollback
>> and I also try
>> curl http://localhost:8983/solr/mysolr/update?rollback true
>> 
>> but it doesn't work. whenever there is a commit the request still updates
>> on the server.
>> 
>> I also try to submit xml document  , but it doesn't work, too.
>> 
>> Could you guide me how to do this?  I haven't found much documentation
>> about this on the internet.
>> 
>> Thanks you in advance.
>> Best regards,
>> Chetra



Re: Metrics API via Solrj

2018-10-03 Thread Shawn Heisey

On 10/3/2018 6:17 AM, Jason Gerlowski wrote:

 NamedList respNL = response.getResponse();
 NamedList metrics = (NamedList)respNL.get("metrics");
 NamedList jvmMetrics = (NamedList)
metrics.get("solr.jvm");
 Long numClassesLoaded = (Long) jvmMetrics.get("classes.loaded");


If you're running a new enough SolrJ version, which you probably are 
because findRecursive was added more than five years ago, all of these 
code lines can be replaced with one code line:


  Long numClassesLoaded = (Long) response.getResponse().findRecursive(
"metrics", "solr.jvm", "classes.loaded");

I don't think it'll run any faster, or even use any less memory, but I 
think it is much easier to read and understand.


Thanks,
Shawn



Re: Modify the log directory for dih

2018-10-03 Thread Shawn Heisey

On 10/2/2018 10:49 PM, lala wrote:

Shawn Heisey-2 wrote

With a change to the log4j configuration file, you can direct all logs
created by the DIH classes to a separate file, no code changes needed.

Since I'm a newbee regarding log4j, Can you please give me an example about
how to change this configuration file for DIH?


What version of Solr do you have, what OS is it running on, and how did 
you install/start it?


Thanks,
Shawn



Re: How to do rollback from solrclient using python

2018-10-03 Thread Jason Gerlowski
Hi Chetra,

The syntax that you're looking for is "/solr/someCoreName/update?rollback=true".

But I'm afraid Rollback might not be quite what you think it is.  You
mentioned: "but it doesn't work, whenever there is a commit the
request still updates on the server".  Yes, that is the expected
behavior with rollbacks.  Rollbacks reset your index to the last
commit point.  If there was a commit right before a rollback, the
rollback will have no effect.

One last point is that you should be very careful using rollbacks.
Rollbacks are going to undo all changes to your index since the last
commit.  If you have more than one client thread changing documents,
this can be very dangerous as you will reset a lot of things you
didn't intend.  Even if you can guarantee that there's only one client
making changes to your index, and that client is itself
single-threaded, the result of a rollback is still indeterminate if
you're using server auto-commit settings.  The client-triggered
rollback will occasionally race against the server-triggered commit.
Will your doc changes get rolled back?  They will if the rollback
happens first, but if the commit happens right before the rollback,
your rollback won't do anything!  Anyways rollbacks have their place,
but be very careful when using them!

Hope that helps,

Jason
On Wed, Oct 3, 2018 at 4:41 AM Chetra Tep  wrote:
>
> Hi Solr team,
> Current I am creating a python application that accesses to solr server.
> I have to handle updating document and need a rollback function.
> I want to send a rollback request whenever exception occurs.
> first I try sth like this from curl command :
> curl http://localhost:8983/solr/mysolr/update?command=rollback
> and I also try
> curl http://localhost:8983/solr/mysolr/update?rollback true
>
> but it doesn't work. whenever there is a commit the request still updates
> on the server.
>
> I also try to submit xml document  , but it doesn't work, too.
>
> Could you guide me how to do this?  I haven't found much documentation
> about this on the internet.
>
> Thanks you in advance.
> Best regards,
> Chetra


Re: Metrics API via Solrj

2018-10-03 Thread Jason Gerlowski
Hi Deniz,

I don't think there are any classes that simplify accessing the
metrics API like there are for other APIs (e.g.
CollectionAdminRequest, CoreAdminRequest, ..).  But sending metrics
requests in SolrJ is still possible; it's just a little bit more
complicated.

Anytime you want to make an API call that doesn't have specific
objects for it, you can use one of the general-purpose SolrRequest
objects.  I've included an example below that reads the
"classes.loaded" JVM metric:

final SolrClient client = new
HttpSolrClient.Builder("http://localhost:8983/solr;).build();

final ModifiableSolrParams params = new ModifiableSolrParams();
params.set("group", "jvm");
final GenericSolrRequest req = new
GenericSolrRequest(SolrRequest.METHOD.GET, "/admin/metrics", params);

SimpleSolrResponse response = req.process(client);
NamedList respNL = response.getResponse();
NamedList metrics = (NamedList)respNL.get("metrics");
NamedList jvmMetrics = (NamedList)
metrics.get("solr.jvm");
Long numClassesLoaded = (Long) jvmMetrics.get("classes.loaded");
System.out.println("Num classes loaded was: " + numClassesLoaded);

It's a little more painful to have to dig through the NamedList
yourself, but it's still very do-able.  Hope that helps.

Best,

Jason
On Wed, Oct 3, 2018 at 3:03 AM deniz  wrote:
>
> Are there anyway to get the metrics via solrj ? all of the examples seem like
> using plain curl or http reqs with json response. I have found
> org.apache.solr.client.solrj.io.stream.metrics package, but couldnt figure
> out how to send the requests via solrj...
>
> could anyone help me to figure out how to deal with metrics api on solrj?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Clarification about Solr Cloud and Shard

2018-10-03 Thread Emir Arnautović
Hi Rekha,
In addition to what Shawn explained, the answer to your last question is yes 
and no: You can split shards, but cannot change number of shards without 
reindexing. And you can add nodes but you should make sure adding nodes will 
result in well balanced cluster.
You can address scalability issues differently. Depending on your case, you 
might not need to have a single index with 200 billion documents. E.g. if you 
have multi-tenant system and each tenant search only its own data, each tenant 
or group of tenants can have a separate index or even separate cluster. Also if 
you append data and often filter by time, you may have time based indices.

Here is blog explaining how to run tests to estimate shard/cluster size: 
http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Oct 2018, at 22:41, Shawn Heisey  wrote:
> 
> On 10/2/2018 9:33 AM, Rekha wrote:
>> Dear Solr Team, I need following clarification from you, please check and 
>> give suggestion to me, 1. I want to store and search 200 Billions of 
>> documents(Each document contains 16 fields). For my case can I able to 
>> achieve by using Solr cloud? 2. For my case how many shard and nodes will be 
>> needed? 3. In future can I able to increase the nodes and shards? Thanks, 
>> Rekha Karthick
> 
> In a nutshell:  It's not possible to give generic advice. The contents of the 
> fields will affect exactly what you need.  The nature of the queries that you 
> send will affect exactly what you need.  The query rate will affect exactly 
> what you need. The overall size of the index (disk space, as well as document 
> count) will affect what you need.
> 
> In the "not very helpful" department, but I promise this is absolute truth, 
> there's this blog post:
> 
> https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> To handle 200 billion documents *in a single collection*, you're probably 
> going to want at least 200 shards, and there are good reasons to go with even 
> more shards than that.  But you need to be warned that there can be serious 
> scalability problems when SolrCloud must keep track of that many different 
> indexes.  Here's an issue I filed for scalability problems with thousands of 
> collections ... there can be similar problems with lots of shards as well.  
> This issue says it is fixed, but no code changes that I am aware of were ever 
> made related to the issue, and as far as I can tell, it's still a problem 
> even in the latest version:
> 
> https://issues.apache.org/jira/browse/SOLR-7191
> 
> That many shard/replicas on one collection is likely to need zookeeper's 
> maximum znode size (jute.maxbuffer) boosted, because it will probably require 
> more than one megabyte to hold the JSON structure describing the collection.
> 
> As for how many machines you'll need ... absolutely no idea.  If query rate 
> will be insanely high, you'll want a dedicated machine for each shard 
> replica, and you may need many replicas, which is going to mean hundreds, 
> possibly thousands, of servers.  If the query rate is really low and/or each 
> document is very small, you might be able to house more than one shard per 
> server.  But you should know that handling 200 billion documents is going to 
> require a lot of hardware even if it turns out that you're not going to be 
> handling tons of data (per document) or queries.
> 
> Thanks,
> Shawn
> 



Re: checksum failed (hardware problem?)

2018-10-03 Thread Stephen Bianamara
Hello All --

As it would happen, we've seen this error on version 6.6.2 very recently.
This is also on an AWS instance, like Simon's report. The drive doesn't
show any sign of being unhealthy, either from cursory investigation. FWIW,
this occurred during a collection backup.

Erick, is there some diagnostic data we can find to help pin this down?

Thanks!
Stephen

On Sun, Sep 30, 2018 at 12:48 PM Susheel Kumar 
wrote:

> Thank you, Simon. Which basically points that something related to env and
> was causing the checksum failures than any lucene/solr issue.
>
> Eric - I did check with hardware folks and they are aware of some VMware
> issue where the VM hosted in HCI environment is coming into some halt state
> for minute or so and may be loosing connections to disk/network.  So that
> probably may be the reason of index corruption though they have not been
> able to find anything specific from logs during the time Solr run into
> issue
>
> Also I had again issue where Solr is loosing the connection with zookeeper
> (Client session timed out, have not heard from server in 8367ms for
> sessionid 0x0)  Does that points to similar hardware issue, Any
> suggestions?
>
> Thanks,
> Susheel
>
> 2018-09-29 17:30:44.070 INFO
> (searcherExecutor-7-thread-1-processing-n:server54:8080_solr
> x:COLL_shard4_replica2 s:shard4 c:COLL r:core_node8) [c:COLL s:shard4
> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.SolrCore
> [COLL_shard4_replica2] Registered new searcher
> Searcher@7a4465b1[COLL_shard4_replica2]
>
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_7x3f(6.6.2):C826923/317917:delGen=2523)
> Uninverting(_83pb(6.6.2):C805451/172968:delGen=2957)
> Uninverting(_3ywj(6.6.2):C727978/334529:delGen=2962)
> Uninverting(_7vsw(6.6.2):C872110/385178:delGen=2020)
> Uninverting(_8n89(6.6.2):C741293/109260:delGen=3863)
> Uninverting(_7zkq(6.6.2):C720666/101205:delGen=3151)
> Uninverting(_825d(6.6.2):C707731/112410:delGen=3168)
> Uninverting(_dgwu(6.6.2):C760421/295964:delGen=4624)
> Uninverting(_gs5x(6.6.2):C540942/138952:delGen=1623)
> Uninverting(_gu6a(6.6.2):c75213/35640:delGen=1110)
> Uninverting(_h33i(6.6.2):c131276/40356:delGen=706)
> Uninverting(_h5tc(6.6.2):c44320/11080:delGen=380)
> Uninverting(_h9d9(6.6.2):c35088/3188:delGen=104)
> Uninverting(_h80h(6.6.2):c11927/3412:delGen=153)
> Uninverting(_h7ll(6.6.2):c11284/1368:delGen=205)
> Uninverting(_h8bs(6.6.2):c11518/2103:delGen=149)
> Uninverting(_h9r3(6.6.2):c16439/1018:delGen=52)
> Uninverting(_h9z1(6.6.2):c9428/823:delGen=27)
> Uninverting(_h9v2(6.6.2):c933/33:delGen=12)
> Uninverting(_ha1c(6.6.2):c1056/1:delGen=1)
> Uninverting(_ha6i(6.6.2):c1883/124:delGen=8)
> Uninverting(_ha3x(6.6.2):c807/14:delGen=3)
> Uninverting(_ha47(6.6.2):c1229/133:delGen=6)
> Uninverting(_hapk(6.6.2):c523) Uninverting(_haoq(6.6.2):c279)
> Uninverting(_hamr(6.6.2):c311) Uninverting(_hap0(6.6.2):c338)
> Uninverting(_hapu(6.6.2):c275) Uninverting(_hapv(6.6.2):C4/2:delGen=1)
> Uninverting(_hapw(6.6.2):C5/2:delGen=1)
> Uninverting(_hapx(6.6.2):C2/1:delGen=1)
> Uninverting(_hapy(6.6.2):C2/1:delGen=1)
> Uninverting(_hapz(6.6.2):C3/1:delGen=1)
> Uninverting(_haq0(6.6.2):C6/3:delGen=1)
> Uninverting(_haq1(6.6.2):C1)))}
> 2018-09-29 17:30:52.390 WARN
>
> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server117:2182))
> [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
> server in 8367ms for sessionid 0x0
> 2018-09-29 17:31:01.302 WARN
>
> (zkCallback-5-thread-91-processing-n:server54:8080_solr-SendThread(server120:2182))
> [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
> server in 8812ms for sessionid 0x0
> 2018-09-29 17:31:14.049 INFO
> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>   ] o.a.s.c.c.ConnectionManager Connection with ZooKeeper
> reestablished.
> 2018-09-29 17:31:14.049 INFO
> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>   ] o.a.s.c.ZkController ZooKeeper session re-connected ... refreshing
> core states after session expiration.
> 2018-09-29 17:31:14.051 INFO
> (zkCallback-5-thread-91-processing-n:server54:8080_solr-EventThread) [
>   ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (16)
> -> (15)
> 2018-09-29 17:31:14.144 INFO  (qtp834133664-520378) [c:COLL s:shard4
> r:core_node8 x:COLL_shard4_replica2] o.a.s.c.S.Request
> [COLL_shard4_replica2]  webapp=/solr path=/admin/ping
>
> params={distrib=false=wordTokens&_stateVer_=COLL:1246=false=/admin/ping=id=score=4=0=true=
> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/=10=2={!lucene}*:*=1538242274139=true=javabin
> }
> webapp=/solr path=/admin/ping
>
> params={distrib=false=wordTokens&_stateVer_=COLL:1246=false=/admin/ping=id=score=4=0=true=
> http://server54:8080/solr/COLL_shard4_replica2/|http://server53:8080/solr/COLL_shard4_replica1/=10=2={!lucene}*:*=1538242274139=true=javabin
> }
> hits=4989979 status=0 QTime=0
>
>
>
>
> On Wed, Sep 

Re: Creating CJK bigram tokens with ClassicTokenizer

2018-10-03 Thread Yasufumi Mizoguchi
Hi, Shawn

Thank you for replying me.

> CJKBigramFilter shouldn't care what tokenizer you're using.  It should
> work with any tokenizer.  What problem are you seeing that you're trying
> to solve?  What version of Solr, what configuration, and what does it do
> that you're not expecting, and what do you want it to do?

I am sorry for lack of information. I tried this with Solr 5.5.5 and 7.5.0.
And here is analyzer configuration from my managed-schema.


  




  



And what I want to do is
1. to create CJ bigram token
2. to extract each word that contains a hyphen and stopwords as a single
token
   (e.g. as-is, to-be, etc...) from CJK and English sentences.

CJKBigramFilter seems to check TOKEN_TYPES attribute added by
StandardTokenizer when creating CJK bigram token.
(See
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKBigramFilter.java#L64
)

ClassicTokenizer also adds obsolete TOKEN_TYPES "CJ" to the CJ token and
"ALPHANUM" to the Korean alphabet, but both are not targets for
CJKBigramFilter...

Thanks,
Yasufumi

2018年10月2日(火) 0:05 Shawn Heisey :

> On 9/30/2018 10:14 PM, Yasufumi Mizoguchi wrote:
> > I am looking for the way to create CJK bigram tokens with
> ClassicTokenizer.
> > I tried this by using CJKBigramFilter, but it only supports for
> > StandardTokenizer...
>
> CJKBigramFilter shouldn't care what tokenizer you're using.  It should
> work with any tokenizer.  What problem are you seeing that you're trying
> to solve?  What version of Solr, what configuration, and what does it do
> that you're not expecting, and what do you want it to do?
>
> I don't have access to the systems where I was using that filter, but if
> I recall correctly, I was using the whitespace tokenizer.
>
> Thanks,
> Shawn
>
>


How to do rollback from solrclient using python

2018-10-03 Thread Chetra Tep
Hi Solr team,
Current I am creating a python application that accesses to solr server.
I have to handle updating document and need a rollback function.
I want to send a rollback request whenever exception occurs.
first I try sth like this from curl command :
curl http://localhost:8983/solr/mysolr/update?command=rollback
and I also try
curl http://localhost:8983/solr/mysolr/update?rollback true

but it doesn't work. whenever there is a commit the request still updates
on the server.

I also try to submit xml document  , but it doesn't work, too.

Could you guide me how to do this?  I haven't found much documentation
about this on the internet.

Thanks you in advance.
Best regards,
Chetra


RE: Opinions on index optimization...

2018-10-03 Thread Markus Jelsma
There are a few bugs for which you require to merge the index, see SOLR-8807 
and related bugs.

https://issues.apache.org/jira/browse/SOLR-8807

-Original message-
> From:Erick Erickson 
> Sent: Wednesday 3rd October 2018 4:50
> To: solr-user 
> Subject: Re: Opinions on index optimization...
> 
> The problem you're at now is that, having run optimize, that single
> massive segment will accumulate deletes until it has < 2.5G "live"
> documents. So once you do optimize (and until you get to Solr 7.5),
> unless you can live with this one segment accumulating deletes for a
> very long time, you must continue to optimize.
> 
> Or you could re-index from scratch if possible and never optimize.
> 
> Best,
> Erick
> On Tue, Oct 2, 2018 at 7:28 AM Walter Underwood  wrote:
> >
> > Don’t optimize. The first article isn’t as clear as it should be. The 
> > important sentence is "Unless you are running into resource problems, it’s 
> > best to leave merging alone.”
> >
> > I’ve been running Solr in production since version 1.3, with several 
> > different kinds and sizes of collections. I’ve never run a daily optimize, 
> > even on collections that only change once per day.
> >
> > The section titles "What? I can’t afford 50% “wasted” space” should have 
> > just been “Then don’t run Solr”. Really, you should have 100% free sapce, 
> > so a 22 Gb index would be on a volume with 22 Gb of free space.
> >
> > It was a mistake to name it “optimize”. It should have been “force merge”.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Oct 2, 2018, at 6:04 AM, Jeff Courtade  wrote:
> > >
> > > We run an old master/slave solr 4.3.0 solr cluster
> > >
> > > 14 nodes 7/7
> > > indexes average 47/5 gig per shard around 2 mill docs per shard.
> > >
> > > We have constant daily additions and a small amount of deletes.
> > >
> > > We optimize nightly currently and it is a system hog.
> > >
> > > Is it feasible to never run optimize?
> > >
> > > I ask because it seems like it would be very bad not to but this
> > > information is out there apparently recommending exactly that... never
> > > optimizing.
> > >
> > > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> > >
> > > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> > >
> > > https://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations
> >
> 


Re: Solr 6.6 LanguageDetector

2018-10-03 Thread Furkan KAMACI
Here is my schema configuration:

   
   
   
   


On Wed, Oct 3, 2018 at 10:50 AM Furkan KAMACI 
wrote:

> Hi,
>
> I use Solr 6.6 and try to test automatic language detection. I've added
> these configuration into my solrconfig.xml.
>
> 
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>   
> content
> en,tr
> language_code
> other
> true
> true
>   
> 
>
>
>  
> ...
>  startup="lazy"
>   class="solr.extraction.ExtractingRequestHandler" >
> 
>   true
>   true
>   ignored_
>   content
>   ignored_
>   ignored_
> 
> 
>   dedupe
>   langid
>   ignore-commit-from-client
>
>   
>
> content field is populated but content_en, content_tr, content_other and
> language_code fields are empty.
>
> What I miss?
>
> Kind Regards,
> Furkan KAMACI
>


Solr 6.6 LanguageDetector

2018-10-03 Thread Furkan KAMACI
Hi,

I use Solr 6.6 and try to test automatic language detection. I've added
these configuration into my solrconfig.xml.


   
  
content
en,tr
language_code
other
true
true
  

   
   
 
...
  

  true
  true
  ignored_
  content
  ignored_
  ignored_


  dedupe
  langid
  ignore-commit-from-client
   
  

content field is populated but content_en, content_tr, content_other and
language_code fields are empty.

What I miss?

Kind Regards,
Furkan KAMACI


Metrics API via Solrj

2018-10-03 Thread deniz
Are there anyway to get the metrics via solrj ? all of the examples seem like
using plain curl or http reqs with json response. I have found
org.apache.solr.client.solrj.io.stream.metrics package, but couldnt figure
out how to send the requests via solrj... 

could anyone help me to figure out how to deal with metrics api on solrj? 



-
Zeki ama calismiyor... Calissa yapar...
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html