[
https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759247#action_12759247
]
Artem Russakovskii edited comment on SOLR-1458 at 9/24/09 2:01 PM:
-------------------------------------------------------------------
Re: hit the master with the filelist command.
First of all, this may have been a really late night for the person who wrote
the wiki:
"Get list of lucene files present in the index:
http://host:port/solr/replication?command=filelist&indexversion=<index-version-number>
. The version number can be obtained using the indexversion calmmand"
The last word there ;-]
Now, I hit the master with the following:
MASTER/replication/?command=indexversion, get back
{code}
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">0</int></lst><long name="indexversion">1253136035158</long><long
name="generation">4447</long>
</response>
{code}
Then I use this in the following query:
MASTER/replication/?command=filelist&indexversion=1253136035158
but I get back
{code}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">0</int></lst><str name="status">invalid indexversion</str>
</response>
{code}
I tried the same against an another instance that doesn't have the
NullPointerException replication problem and still get the same error, then I
tried another one, which had no documents indexed into yet, and it returned
{code}
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">0</int></lst><arr name="filelist"><lst><str
name="name">segments_1</str><long name="lastmodified">1253136032000</long><long
name="size">32</long></lst></arr>
</response>
{code}
The weird part is that even though the master responds with invalid
indexversion for almost every instance I've tried, only 2 of them are failing
as far as the slave is concerned:
On the master:
{code}
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_08 path=/replication
params={command=indexversion&wt=javabin} status=0 QTime=0
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_04 path=/replication
params={command=indexversion&wt=javabin} status=0 QTime=0
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_10 path=/replication
params={command=indexversion&wt=javabin} status=0 QTime=0
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_02 path=/replication
params={command=indexversion&wt=javabin} status=0 QTime=0
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_12 path=/replication
params={command=indexversion&wt=javabin} status=0 QTime=0
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_06 path=/replication
params={command=indexversion&wt=javabin} status=0 QTime=0
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_08 path=/replication
params={indexversion=1253136074767&command=filelist&wt=javabin} status=0
QTime=0
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_06 path=/replication
params={indexversion=1253136077032&command=filelist&wt=javabin} status=0
QTime=0
{code}
On the slave:
{code}
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1253136074767, generation: 40983
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1253136077032, generation: 42291
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1253136076722, generation: 41981
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1253136074452, generation: 40668
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed
java.lang.NullPointerException
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed
java.lang.NullPointerException
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
{code}
Suggestions?
was (Author: archon810):
Re: hit the master with the filelist command.
First of all, this may have been a really late night for the person who wrote
the wiki:
"Get list of lucene files present in the index:
http://host:port/solr/replication?command=filelist&indexversion=<index-version-number>
. The version number can be obtained using the indexversion calmmand"
The last word there ;-]
Now, I hit the master with the following:
MASTER/replication/?command=indexversion, get back
{code}
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">0</int></lst><long name="indexversion">1253136035158</long><long
name="generation">4447</long>
</response>
{code}
Then I use this in the following query:
MASTER/replication/?command=filelist&indexversion=1253136035158
but I get back
{code}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int
name="QTime">0</int></lst><str name="status">invalid indexversion</str>
</response>
{code}
I tried the same against an instance that doesn't have the NullPointerException
replication problem and still get the same error.
Suggestions?
> Java Replication error: NullPointerException SEVERE: SnapPull failed on
> 2009-09-22 nightly
> ------------------------------------------------------------------------------------------
>
> Key: SOLR-1458
> URL: https://issues.apache.org/jira/browse/SOLR-1458
> Project: Solr
> Issue Type: Bug
> Components: replication (java)
> Affects Versions: 1.4
> Environment: CentOS x64
> 8GB RAM
> Tomcat, running with 7G max memory; memory usage is <2GB, so it's not the
> problem
> Host a: master
> Host b: slave
> Multiple single core Solr instances, using JNDI.
> Java replication
> Reporter: Artem Russakovskii
> Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1458.patch, SOLR-1458.patch
>
>
> After finally figuring out the new Java based replication, we have started
> both the slave and the master and issued optimize to all master Solr
> instances. This triggered some replication to go through just fine, but it
> looks like some of it is failing.
> Here's what I'm getting in the slave logs, repeatedly for each shard:
> {code}
> SEVERE: SnapPull failed
> java.lang.NullPointerException
> at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> If I issue an optimize again on the master to one of the shards, it then
> triggers a replication and replicates OK. I have a feeling that these
> SnapPull failures appear later on but right now I don't have enough to form a
> pattern.
> Here's replication.properties on one of the failed slave instances.
> {code}
> cat data/replication.properties
> #Replication details
> #Wed Sep 23 19:35:30 PDT 2009
> replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
> previousCycleTimeInSeconds=0
> timesFailed=113
> indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
> indexReplicatedAt=1253759730020
> replicationFailedAt=1253759730020
> lastCycleBytesDownloaded=0
> timesIndexReplicated=113
> {code}
> and another
> {code}
> cat data/replication.properties
> #Replication details
> #Wed Sep 23 18:42:01 PDT 2009
> replicationFailedAtList=1253756490034,1253756460169
> previousCycleTimeInSeconds=1
> timesFailed=2
> indexReplicatedAtList=1253756521284,1253756490034,1253756460169
> indexReplicatedAt=1253756521284
> replicationFailedAt=1253756490034
> lastCycleBytesDownloaded=22932293
> timesIndexReplicated=3
> {code}
> Some relevant configs:
> In solrconfig.xml:
> {code}
> <!-- For docs see http://wiki.apache.org/solr/SolrReplication -->
> <requestHandler name="/replication" class="solr.ReplicationHandler" >
> <lst name="master">
> <str name="enable">${enable.master:false}</str>
> <str name="replicateAfter">optimize</str>
> <str name="backupAfter">optimize</str>
> <str name="commitReserveDuration">00:00:20</str>
> </lst>
> <lst name="slave">
> <str name="enable">${enable.slave:false}</str>
> <!-- url of master, from properties file -->
> <str name="masterUrl">${master.url}</str>
> <!-- how often to check master -->
> <str name="pollInterval">00:00:30</str>
> </lst>
> </requestHandler>
> {code}
> The slave then has this in solrcore.properties:
> {code}
> enable.slave=true
> master.url=URLOFMASTER/replication
> {code}
> and the master has
> {code}
> enable.master=true
> {code}
> I'd be glad to provide more details but I'm not sure what else I can do.
> SOLR-926 may be relevant.
> Thanks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.