zero-day exploit security issue

2017-10-13 Thread Xie, Sean
Is there a tracking to address this issue for SOLR 6.6.x and 7.x?

https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list

Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


HTTP HEAD method is gone?

2017-10-09 Thread Xie, Sean
After upgrading from 6.5.1 to 6.6.1, the http HEAD method requesting 
/favicon.ico returns the 404. Http GET method is still working as it returns 
200 OK.

Before the upgrading, both HEAD and GET give http 200 OK response.

Any settings to adjust?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: DocValues error when upgrading to 6.6.1 from 6.5

2017-10-03 Thread Xie, Sean
I have figured out the problem. The schema was changed and index has been 
deleted and rebuilt since then. But the index files might still contain the old 
stale segments. 

I replayed the situation by restoring the old data using 6.5, then do the 
optimization, then upgrade the 6.6.1, and found out the error is gone.

Thanks
Sean

On 9/22/17, 12:16 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This error is not about DocValuesAsStored, but about
multiValued=true|false. It indicates that multiValued is set to
"false" for the current index but "true" in the new schema. At least
that's my guess

Best,
Erick

    On Thu, Sep 21, 2017 at 11:56 AM, Xie, Sean <sean@finra.org> wrote:
> Hi,
>
> When I upgrade the existing SOLR from 6.5.1 to 6.6.1, I’m getting:
> cannot change DocValues type from SORTED to SORTED_SET for field “…..”
>
> During the upgrades, there is no change on schema and schema version (we 
are using schema version 1.5 so seDocValuesAsStored defaults are not taking 
into affect).
>
> Not sure why this is happening.
>
> Planning to upgrade the SOLR version on other clusters, but don’t really 
want to do re-index for all the data.
>
> Any suggestion?
>
> Thanks
> Sean
>
> Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.




DocValues error when upgrading to 6.6.1 from 6.5

2017-09-21 Thread Xie, Sean
Hi,

When I upgrade the existing SOLR from 6.5.1 to 6.6.1, I’m getting:
cannot change DocValues type from SORTED to SORTED_SET for field “…..”

During the upgrades, there is no change on schema and schema version (we are 
using schema version 1.5 so seDocValuesAsStored defaults are not taking into 
affect).

Not sure why this is happening.

Planning to upgrade the SOLR version on other clusters, but don’t really want 
to do re-index for all the data.

Any suggestion?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: solr in memory testing

2017-08-08 Thread Xie, Sean
There is MiniSolrCloudCluster that you can use for testing. This is from 
solr-test-framework: 
https://github.com/apache/lucene-solr/tree/master/solr/test-framework.

On 8/8/17, 7:54 AM, "Thaer Sammar"  wrote:

Hi,

We are using solr 6.6, and we are looking for guidance documentation or 
java example on how to create a solr core inmeory for the purpose of testing 
using solrj. We found 
https://wiki.searchtechnologies.com/index.php/Unit_Testing_with_Embedded_Solr 
but this works for solr v.4 and earlier versions.

regards,


Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


RE: CDCR - how to deal with the transaction log files

2017-07-28 Thread Xie, Sean
You don't need to start cdcr on target cluster. Other steps are exactly what I 
did. After disable buffer on both target and source, the tlog files are purged 
according to the specs.


-- Thank you
Sean

From: Patrick Hoeffel 
>
Date: Friday, Jul 28, 2017, 4:01 PM
To: solr-user@lucene.apache.org 
>
Cc: jmy...@wayfair.com >
Subject: [EXTERNAL] RE: CDCR - how to deal with the transaction log files

Amrit,

Problem solved! My biggest mistake was in my SOURCE-side configuration. The 
zkHost field needed the entire zkHost string, including the CHROOT indicator. I 
suppose that should have been obvious to me, but the examples only showed the 
IP Address of the target ZK, and I made a poor assumption.

  
  
  
10.161.0.7:2181,10.161.0.6:2181,10.161.0.5:2181/chroot/solr
ks_v1
ks_v1
  

  
  
  
10.161.0.7:2181 <=== Problem was here.
ks_v1
ks_v1
  


After that, I just made sure I did this:
1. Stop all Solr nodes at both SOURCE and TARGET.
2. $ rm -rf $SOLR_HOME/server/solr/collection_name/data/tlog/*.*
3. On the TARGET:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

4. On the Source:
a. $ collection/cdcr?action=DISABLEBUFFER
b. $ collection/cdcr?action=START

At this point any existing data in the SOURCE collection started flowing into 
the TARGET collection, and it has remained congruent ever since.

Thanks,



Patrick Hoeffel

Senior Software Engineer
(Direct)  719-452-7371
(Mobile) 719-210-3706
patrick.hoef...@polarisalpha.com
PolarisAlpha.com


-Original Message-
From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
Sent: Friday, July 21, 2017 7:21 AM
To: solr-user@lucene.apache.org
Cc: jmy...@wayfair.com
Subject: Re: CDCR - how to deal with the transaction log files

Patrick,

Yes! You created default UpdateLog which got written to a disk and then you 
changed it to CdcrUpdateLog in configs. I find no reason it would create a 
proper COLLECTIONCHECKPOINT on target tlog.

One thing you can try before creating / starting from scratch is restarting 
source cluster nodes, the leaders of shard will try to create the same 
COLLECTIONCHECKPOINT, which may or may not be successful.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Jul 21, 2017 at 11:09 AM, Patrick Hoeffel < 
patrick.hoef...@polarisalpha.com> wrote:

> I'm working on my first setup of CDCR, and I'm seeing the same "The
> log reader for target collection {collection name} is not initialised"
> as you saw.
>
> It looks like you're creating collections on a regular basis, but for
> me, I create it one time and never again. I've been creating the
> collection first from defaults and then applying the CDCR-aware
> solrconfig changes afterward. It sounds like maybe I need to create
> the configset in ZK first, then create the collections, first on the
> Target and then on the Source, and I should be good?
>
> Thanks,
>
> Patrick Hoeffel
> Senior Software Engineer
> (Direct)  719-452-7371
> (Mobile) 719-210-3706
> patrick.hoef...@polarisalpha.com
> PolarisAlpha.com
>
>
> -Original Message-
> From: jmyatt [mailto:jmy...@wayfair.com]
> Sent: Wednesday, July 12, 2017 4:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CDCR - how to deal with the transaction log files
>
> glad to hear you found your solution!  I have been combing over this
> post and others on this discussion board many times and have tried so
> many tweaks to configuration, order of steps, etc, all with absolutely
> no success in getting the Source cluster tlogs to delete.  So
> incredibly frustrating.  If anyone has other pearls of wisdom I'd love some 
> advice.
> Quick hits on what I've tried:
>
> - solrconfig exactly like Sean's (target and source respectively)
> expect no autoSoftCommit
> - I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
> target) explicitly before starting since the config setting of
> defaultState=disabled doesn't seem to work
> - when I create the collection on source first, I get the warning "The
> log reader for target collection {collection name} is not
> initialised".  When I reverse the order (create the collection on
> target first), no such warning
> - tlogs replicate as expected, hard commits on both target and source
> cause tlogs to rollover, etc - all of that works as expected
> - action=QUEUES on source reflects the queueSize accurately.  Also
> *always* shows updateLogSynchronizer state as "stopped"
> - action=LASTPROCESSEDVERSION on both source and target always seems
> correct (I don't see the -1 that Sean mentioned).
> - I'm creating new collections every time and running full data
> imports 

Re: CDCR - how to deal with the transaction log files

2017-07-12 Thread Xie, Sean
Try run second data import or any other indexing jobs after the replication of 
the first data import is completed.

My observation is during the replication period (when there is docs in queue), 
tlog clean up will not triggered. So when queue is 0, and submit second batch 
and monitor the queue and tlogs again.

-- Thank you
Sean

From: jmyatt >
Date: Wednesday, Jul 12, 2017, 6:58 PM
To: solr-user@lucene.apache.org 
>
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

glad to hear you found your solution!  I have been combing over this post and
others on this discussion board many times and have tried so many tweaks to
configuration, order of steps, etc, all with absolutely no success in
getting the Source cluster tlogs to delete.  So incredibly frustrating.  If
anyone has other pearls of wisdom I'd love some advice.  Quick hits on what
I've tried:

- solrconfig exactly like Sean's (target and source respectively) expect no
autoSoftCommit
- I am also calling cdcr?action=DISABLEBUFFER (on source as well as on
target) explicitly before starting since the config setting of
defaultState=disabled doesn't seem to work
- when I create the collection on source first, I get the warning "The log
reader for target collection {collection name} is not initialised".  When I
reverse the order (create the collection on target first), no such warning
- tlogs replicate as expected, hard commits on both target and source cause
tlogs to rollover, etc - all of that works as expected
- action=QUEUES on source reflects the queueSize accurately.  Also *always*
shows updateLogSynchronizer state as "stopped"
- action=LASTPROCESSEDVERSION on both source and target always seems correct
(I don't see the -1 that Sean mentioned).
- I'm creating new collections every time and running full data imports that
take 5-10 minutes. Again, all data replication, log rollover, and autocommit
activity seems to work as expected, and logs on target are deleted.  It's
just those pesky source tlogs I can't get to delete.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CDCR-how-to-deal-with-the-transaction-log-files-tp4345062p4345715.html
Sent from the Solr - User mailing list archive at Nabble.com.

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: Tlogs not being deleted/truncated

2017-07-11 Thread Xie, Sean
Please see my previous thread. I have to disable buffer on source cluster and a 
scheduled hard commit with scheduled logscheduler to make it work.


-- Thank you
Sean

From: jmyatt >
Date: Tuesday, Jul 11, 2017, 1:56 PM
To: solr-user@lucene.apache.org 
>
Subject: [EXTERNAL] Re: Tlogs not being deleted/truncated

another interesting clue in my case (different from what WebsterHomer is
seeing): the response from /cdcr?action=QUEUES reflects what I would expect
to see in the tlog directory but it's not accurate.  By that I mean
tlogTotalSize shows 1500271 (bytes) and tlogTotalCount shows 2.  This
changes as more updates come in and autoCommit runs - sometimes
tlogTotalCount is 1 instead of 2, and the tlogTotalSize changes but stays in
that low range.

But on the filesystem, all the tlogs are still there.  Perhaps the ignored
exception noted above is in fact a problem?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tlogs-not-being-deleted-truncated-tp4341958p4345477.html
Sent from the Solr - User mailing list archive at Nabble.com.

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
My guess was the documentation gap.

I did a testing that turning off the CDCR by using action=stop, while 
continuously sending documents to the source cluster. The tlog files were 
growing; And after the hard commit, a new tlog file was created and the old 
files stayed there forever. As soon as I turned on CDCR, the documents started 
to replicate to the target. 

After a hard commit and scheduled log synchronizer run, the old tlog files got 
deleted.

Btw, I’m running on 6.5.1.



On 7/10/17, 10:57 PM, "Varun Thacker" <va...@vthacker.in> wrote:

Yeah it just seems weird that you would need to disable the buffer on the
source cluster though.

The docs say "Replicas do not need to buffer updates, and it is recommended
to disable buffer on the target SolrCloud" which means the source should
have it enabled.

But the fact that it's working for you proves otherwise . What version of
Solr are you running? I'll try reproducing this problem at my end and see
if it's a documentation gap or a bug.

On Mon, Jul 10, 2017 at 7:15 PM, Xie, Sean <sean@finra.org> wrote:

> Yes. Documents are being sent to target. Monitoring the output from
> “action=queues”, depending your settings, you will see the documents
> replication progress.
>
> On the other hand, if enable the buffer, the lastprocessedversion is
> always returning -1. Reading the source code, the CdcrUpdateLogSynchroizer
> does not continue to do the clean if this value is -1.
>
> Sean
>
> On 7/10/17, 5:18 PM, "Varun Thacker" <va...@vthacker.in> wrote:
>
> After disabling the buffer are you still seeing documents being
> replicated
    > to the target cluster(s) ?
>
> On Mon, Jul 10, 2017 at 1:07 PM, Xie, Sean <sean@finra.org> wrote:
>
> > After several experiments and observation, finally make it work.
> > The key point is you have to also disablebuffer on source cluster. I
> don’t
> > know why in the wiki, it didn’t mention it, but I figured this out
> through
> > the source code.
> > Once disablebuffer on source cluster, the lastProcessedVersion will
> become
> > a position number, and when there is hard commit, the old unused
> tlog files
> > get deleted.
> >
> > Hope my finding can help other users who experience the same issue.
> >
> >
> > On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com>
> wrote:
>     >
    > > We have been experiencing this same issue for months now, with
> version
> > 6.2.  No solution to date.
> >
> > -Original Message-
> > From: Xie, Sean [mailto:sean@finra.org]
> > Sent: Sunday, July 09, 2017 9:41 PM
> > To: solr-user@lucene.apache.org
> > Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction
> log
> > files
> >
> > Did another round of testing, the tlog on target cluster is
> cleaned up
> > once the hard commit is triggered. However, on source cluster, the
> tlog
> > files stay there and never gets cleaned up.
> >
> > Not sure if there is any command to run manually to trigger the
> > updateLogSynchronizer. The updateLogSynchronizer already set at run
> at
> > every 10 seconds, but seems it didn’t help.
> >
> > Any help?
> >
> > Thanks
> > Sean
> >
> > On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:
> >
> > I have monitored the CDCR process for a while, the updates
> are
> > actively sent to the target without a problem. However the tlog size
> and
> > files count are growing everyday, even when there is 0 updates to
> sent, the
> > tlog stays there:
> >
> > Following is from the action=queues command, and you can see
> after
> > about a month or so running days, the total transaction are reaching
> to
> > 140K total files, and size is about 103G.
> >
> > 
> > 
> > 0
> > 465
> > 
> > 
> > 
> > 
> > 0

RE: ZooKeeper transaction logs

2017-07-10 Thread Xie, Sean
Not sure if I can answer the question, we previously use the manual command to 
cleanup the log, and use a linux daemon the schedule it. In windows, there 
should be corresponding tool to do so.

We currently use the Netflix exhibitor to manage the zookeeper instances, and 
it works pretty well.

Sean


On 7/10/17, 6:43 AM, "Avi Steiner" <astei...@varonis.com> wrote:

I did use this class using batch file (from Windows server), but it still 
does not remove anything. I sent number of snapshots to keep as 3, but I have 
more in my folder.

-Original Message-----
    From: Xie, Sean [mailto:sean@finra.org]
Sent: Sunday, July 9, 2017 7:33 PM
To: solr-user@lucene.apache.org
Subject: Re: ZooKeeper transaction logs

You can try run purge manually see if it is working: 
org.apache.zookeeper.server.PurgeTxnLog.

And use a cron job to do clean up.


On 7/9/17, 11:07 AM, "Avi Steiner" <astei...@varonis.com> wrote:

Hello

I'm using Zookeeper 3.4.6

The ZK log data folder keeps growing with transaction logs files 
(log.*).

I set the following in zoo.cfg:
autopurge.purgeInterval=1
autopurge.snapRetainCount=3
dataDir=..\\data

Per ZK log, it reads those parameters:

2017-07-09 17:44:59,792 [myid:] - INFO  [main:DatadirCleanupManager@78] 
- autopurge.snapRetainCount set to 3
2017-07-09 17:44:59,792 [myid:] - INFO  [main:DatadirCleanupManager@79] 
- autopurge.purgeInterval set to 1

It also says that cleanup process is running:

2017-07-09 17:44:59,792 [myid:] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2017-07-09 17:44:59,823 [myid:] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.

But actually nothing is deleted.
Every service restart, new file is created.

The only parameter I managed to change is preAllocSize, which means the 
minimum size per file. The default is 64MB. I changed it to 10KB only for 
watching the effect.



This email and any attachments thereto may contain private, 
confidential, and privileged material for the sole use of the intended 
recipient. Any review, copying, or distribution of this email (or any 
attachments thereto) by others is strictly prohibited. If you are not the 
intended recipient, please contact the sender immediately and permanently 
delete the original and any copies of this email and any attachments thereto.



Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.

This email and any attachments thereto may contain private, confidential, 
and privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.




Re: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
Yes. Documents are being sent to target. Monitoring the output from 
“action=queues”, depending your settings, you will see the documents 
replication progress.

On the other hand, if enable the buffer, the lastprocessedversion is always 
returning -1. Reading the source code, the CdcrUpdateLogSynchroizer does not 
continue to do the clean if this value is -1.

Sean

On 7/10/17, 5:18 PM, "Varun Thacker" <va...@vthacker.in> wrote:

After disabling the buffer are you still seeing documents being replicated
to the target cluster(s) ?

On Mon, Jul 10, 2017 at 1:07 PM, Xie, Sean <sean@finra.org> wrote:

> After several experiments and observation, finally make it work.
> The key point is you have to also disablebuffer on source cluster. I don’t
> know why in the wiki, it didn’t mention it, but I figured this out through
> the source code.
> Once disablebuffer on source cluster, the lastProcessedVersion will become
> a position number, and when there is hard commit, the old unused tlog 
files
> get deleted.
>
> Hope my finding can help other users who experience the same issue.
>
>
> On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com> wrote:
>
> We have been experiencing this same issue for months now, with version
> 6.2.  No solution to date.
>
> -Original Message-
> From: Xie, Sean [mailto:sean@finra.org]
> Sent: Sunday, July 09, 2017 9:41 PM
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log
> files
>
> Did another round of testing, the tlog on target cluster is cleaned up
> once the hard commit is triggered. However, on source cluster, the tlog
> files stay there and never gets cleaned up.
>
> Not sure if there is any command to run manually to trigger the
> updateLogSynchronizer. The updateLogSynchronizer already set at run at
> every 10 seconds, but seems it didn’t help.
>
> Any help?
>
> Thanks
> Sean
>
> On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:
>
> I have monitored the CDCR process for a while, the updates are
> actively sent to the target without a problem. However the tlog size and
> files count are growing everyday, even when there is 0 updates to sent, 
the
> tlog stays there:
>
> Following is from the action=queues command, and you can see after
> about a month or so running days, the total transaction are reaching to
> 140K total files, and size is about 103G.
>
> 
> 
> 0
> 465
> 
> 
> 
> 
> 0
> 2017-07-07T23:19:09.655Z
> 
> 
> 
> 102740042616
> 140809
> stopped
> 
>
> Any help on it? Or do I need to configure something else? The CDCR
> configuration is pretty much following the wiki:
>
> On target:
>
>   
> 
>   disabled
> 
>   
>
>   
> 
> 
>   
>
>   
> 
>   cdcr-processor-chain
> 
>   
>
>   
> 
>   ${solr.ulog.dir:}
> 
> 
>   ${solr.autoCommit.maxTime:18}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:3}
> 
>   
>
> On source:
>   
> 
>   ${TargetZk}
>   MY_COLLECTION
>   MY_COLLECTION
> 
>
> 
>   1
>   1000
>   128
> 
>
> 
>   6
> 
>   
>
>   
> 
>   ${solr.ulog.dir:}
> 
> 
>   ${solr.autoCommit.maxTime:18}
>   false
> 
>
> 
>   ${solr.autoSoftCommit.maxTime:3}
> 
>   
>
  

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
After several experiments and observation, finally make it work. 
The key point is you have to also disablebuffer on source cluster. I don’t know 
why in the wiki, it didn’t mention it, but I figured this out through the 
source code. 
Once disablebuffer on source cluster, the lastProcessedVersion will become a 
position number, and when there is hard commit, the old unused tlog files get 
deleted.

Hope my finding can help other users who experience the same issue.


On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com> wrote:

We have been experiencing this same issue for months now, with version 6.2. 
 No solution to date.

-Original Message-----
    From: Xie, Sean [mailto:sean@finra.org]
Sent: Sunday, July 09, 2017 9:41 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

Did another round of testing, the tlog on target cluster is cleaned up once 
the hard commit is triggered. However, on source cluster, the tlog files stay 
there and never gets cleaned up.

Not sure if there is any command to run manually to trigger the 
updateLogSynchronizer. The updateLogSynchronizer already set at run at every 10 
seconds, but seems it didn’t help.

Any help?

Thanks
Sean

On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:

I have monitored the CDCR process for a while, the updates are actively 
sent to the target without a problem. However the tlog size and files count are 
growing everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after 
about a month or so running days, the total transaction are reaching to 140K 
total files, and size is about 103G.



0
465




0
2017-07-07T23:19:09.655Z



102740042616
140809
stopped


Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  ${solr.ulog.dir:}


  ${solr.autoCommit.maxTime:18}
  false



  ${solr.autoSoftCommit.maxTime:3}

  

On source:
  

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION



  1
  1000
  128



  6

  

  

  ${solr.ulog.dir:}


  ${solr.autoCommit.maxTime:18}
  false



  ${solr.autoSoftCommit.maxTime:3}

  

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This should not be the case if you are actively sending updates to 
the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick
    
    On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org> 
wrote:
> Once enabled CDCR, update log stores an unlimited number of 
entries. This is causing the tlog folder getting bigger and bigger, as well as 
the open files are growing. How can one reduce the number of open files and 
also to reduce the tlog files? If it’s not taken care properly, sooner or later 
the log files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may 
include non-public, proprietary, confidential or legally privileged 
information.  If you are not an intended recipient or an authorized agent of an 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the information contained in or transmitted with 
this e-mail is unauthorized and strictly prohibited.  If you have received this 
email in error, please notify the sender by replying to this message and 
permanently delete this e-mail, its attachme

RE: CDCR - how to deal with the transaction log files

2017-07-10 Thread Xie, Sean
Did some source code reading, and looks like when lastProcessedVersion==-1, 
then it will do nothing:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/CdcrUpdateLogSynchronizer.java

// if we received -1, it means that the log reader on the leader has 
not yet started to read log entries
// do nothing
if (lastVersion == -1) {
  return;
}

So I queried the solr to find out, and here is the results:

/cdcr?action=LASTPROCESSEDVERSION



0
0

-1


Anything could cause this issue to happen?


Sean


On 7/10/17, 9:08 AM, "Michael McCarthy" <michael.mccar...@gm.com> wrote:

We have been experiencing this same issue for months now, with version 6.2. 
 No solution to date.

-Original Message-----
    From: Xie, Sean [mailto:sean@finra.org]
Sent: Sunday, July 09, 2017 9:41 PM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: CDCR - how to deal with the transaction log files

Did another round of testing, the tlog on target cluster is cleaned up once 
the hard commit is triggered. However, on source cluster, the tlog files stay 
there and never gets cleaned up.

Not sure if there is any command to run manually to trigger the 
updateLogSynchronizer. The updateLogSynchronizer already set at run at every 10 
seconds, but seems it didn’t help.

Any help?

Thanks
Sean

On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:

I have monitored the CDCR process for a while, the updates are actively 
sent to the target without a problem. However the tlog size and files count are 
growing everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after 
about a month or so running days, the total transaction are reaching to 140K 
total files, and size is about 103G.



0
465




0
2017-07-07T23:19:09.655Z



102740042616
140809
stopped


Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  ${solr.ulog.dir:}


  ${solr.autoCommit.maxTime:18}
  false



  ${solr.autoSoftCommit.maxTime:3}

  

On source:
  

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION



  1
  1000
  128



  6

  

  

  ${solr.ulog.dir:}


  ${solr.autoCommit.maxTime:18}
  false



  ${solr.autoSoftCommit.maxTime:3}

  

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This should not be the case if you are actively sending updates to 
the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick
    
    On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org> 
wrote:
> Once enabled CDCR, update log stores an unlimited number of 
entries. This is causing the tlog folder getting bigger and bigger, as well as 
the open files are growing. How can one reduce the number of open files and 
also to reduce the tlog files? If it’s not taken care properly, sooner or later 
the log files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may 
include non-public, proprietary, confidential or legally privileged 
information.  If you are not an intended recipient or an authorized agent of an 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the information contained in or transmitted with 
this e-mail is unauthorized and strictly prohibited.  If you have received

Re: CDCR - how to deal with the transaction log files

2017-07-09 Thread Xie, Sean
Did another round of testing, the tlog on target cluster is cleaned up once the 
hard commit is triggered. However, on source cluster, the tlog files stay there 
and never gets cleaned up.

Not sure if there is any command to run manually to trigger the 
updateLogSynchronizer. The updateLogSynchronizer already set at run at every 10 
seconds, but seems it didn’t help.

Any help?

Thanks
Sean

On 7/8/17, 1:14 PM, "Xie, Sean" <sean@finra.org> wrote:

I have monitored the CDCR process for a while, the updates are actively 
sent to the target without a problem. However the tlog size and files count are 
growing everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after about a 
month or so running days, the total transaction are reaching to 140K total 
files, and size is about 103G.



0
465




0
2017-07-07T23:19:09.655Z



102740042616
140809
stopped


Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  ${solr.ulog.dir:}

 
  ${solr.autoCommit.maxTime:18}
  false 


 
  ${solr.autoSoftCommit.maxTime:3}
 
  

On source:
  

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION



  1
  1000
  128



  6

  

  

  ${solr.ulog.dir:}

 
  ${solr.autoCommit.maxTime:18}
  false 


 
  ${solr.autoSoftCommit.maxTime:3}
 
  

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This should not be the case if you are actively sending updates to the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org> wrote:
> Once enabled CDCR, update log stores an unlimited number of entries. 
This is causing the tlog folder getting bigger and bigger, as well as the open 
files are growing. How can one reduce the number of open files and also to 
reduce the tlog files? If it’s not taken care properly, sooner or later the log 
files size and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may 
include non-public, proprietary, confidential or legally privileged 
information.  If you are not an intended recipient or an authorized agent of an 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of the information contained in or transmitted with 
this e-mail is unauthorized and strictly prohibited.  If you have received this 
email in error, please notify the sender by replying to this message and 
permanently delete this e-mail, its attachments, and any copies of it 
immediately.  You should not retain, copy or use this e-mail or any attachment 
for any purpose, nor disclose all or any part of the contents to any other 
person. Thank you.






Re: ZooKeeper transaction logs

2017-07-09 Thread Xie, Sean
You can try run purge manually see if it is working: 
org.apache.zookeeper.server.PurgeTxnLog.

And use a cron job to do clean up.


On 7/9/17, 11:07 AM, "Avi Steiner"  wrote:

Hello

I'm using Zookeeper 3.4.6

The ZK log data folder keeps growing with transaction logs files (log.*).

I set the following in zoo.cfg:
autopurge.purgeInterval=1
autopurge.snapRetainCount=3
dataDir=..\\data

Per ZK log, it reads those parameters:

2017-07-09 17:44:59,792 [myid:] - INFO  [main:DatadirCleanupManager@78] - 
autopurge.snapRetainCount set to 3
2017-07-09 17:44:59,792 [myid:] - INFO  [main:DatadirCleanupManager@79] - 
autopurge.purgeInterval set to 1

It also says that cleanup process is running:

2017-07-09 17:44:59,792 [myid:] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2017-07-09 17:44:59,823 [myid:] - INFO  
[PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.

But actually nothing is deleted.
Every service restart, new file is created.

The only parameter I managed to change is preAllocSize, which means the 
minimum size per file. The default is 64MB. I changed it to 10KB only for 
watching the effect.



This email and any attachments thereto may contain private, confidential, 
and privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.



Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: CDCR - how to deal with the transaction log files

2017-07-08 Thread Xie, Sean
I have monitored the CDCR process for a while, the updates are actively sent to 
the target without a problem. However the tlog size and files count are growing 
everyday, even when there is 0 updates to sent, the tlog stays there:

Following is from the action=queues command, and you can see after about a 
month or so running days, the total transaction are reaching to 140K total 
files, and size is about 103G.



0
465




0
2017-07-07T23:19:09.655Z



102740042616
140809
stopped


Any help on it? Or do I need to configure something else? The CDCR 
configuration is pretty much following the wiki:

On target:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  ${solr.ulog.dir:}

 
  ${solr.autoCommit.maxTime:18}
  false 


 
  ${solr.autoSoftCommit.maxTime:3}
 
  

On source:
  

  ${TargetZk}
  MY_COLLECTION
  MY_COLLECTION



  1
  1000
  128



  6

  

  

  ${solr.ulog.dir:}

 
  ${solr.autoCommit.maxTime:18}
  false 


 
  ${solr.autoSoftCommit.maxTime:3}
 
  

Thanks.
Sean

On 7/8/17, 12:10 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

This should not be the case if you are actively sending updates to the
target cluster. The tlog is used to store unsent updates, so if the
connection is broken for some time, the target cluster will have a
chance to catch up.

If you don't have the remote DC online and do not intend to bring it
online soon, you should turn CDCR off.

Best,
Erick

On Fri, Jul 7, 2017 at 9:35 PM, Xie, Sean <sean@finra.org> wrote:
> Once enabled CDCR, update log stores an unlimited number of entries. This 
is causing the tlog folder getting bigger and bigger, as well as the open files 
are growing. How can one reduce the number of open files and also to reduce the 
tlog files? If it’s not taken care properly, sooner or later the log files size 
and open file count will exceed the limits.
>
> Thanks
> Sean
>
>
> Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.




CDCR - how to deal with the transaction log files

2017-07-07 Thread Xie, Sean
Once enabled CDCR, update log stores an unlimited number of entries. This is 
causing the tlog folder getting bigger and bigger, as well as the open files 
are growing. How can one reduce the number of open files and also to reduce the 
tlog files? If it’s not taken care properly, sooner or later the log files size 
and open file count will exceed the limits.

Thanks
Sean


Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: Live update the zookeeper for SOLR

2017-06-29 Thread Xie, Sean
Believe I found the issue, it was caused a script when starting the zk 
instance, it also cleared all the solr configuration data from zk, causing the 
solr to stop working.

However, a new issue is coming:

When using static IPs for zookeeper ensemble, it works perfectly and SOLR can 
reconnect to another live zookeeper. But, when use host name (dns name) as the 
ensemble, it seems SOLR use the static IP from the host name and cache it 
forever. So when doing the rolling updates to the zookeeper, SOLR eventually 
died because couldn’t connect to the previous IP even though the same name is 
pointing to the new IP.

Any suggestion?

Thanks
Sean

On 6/16/17, 3:34 PM, "Xie, Sean" <sean@finra.org> wrote:

Solr is configured with the zookeeper ensemble as mentioned below.

I will provide logs in a later time.

From: Shawn Heisey <apa...@elyograg.org<mailto:apa...@elyograg.org>>
Date: Friday, Jun 16, 2017, 12:27 PM
To: solr-user@lucene.apache.org 
<solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
Subject: [EXTERNAL] Re: Live update the zookeeper for SOLR

On 6/16/2017 9:05 AM, Xie, Sean wrote:
> Is there a way to keep SOLR alive when zookeeper instances (3 instance
> ensemble) are rolling updated one at a time? It seems SOLR cluster use
> one of the zookeeper instance and when the communication is broken in
> between, it won’t be able to reconnect to another zookeeper instance
> and keep itself alive. A service restart is need in this situation.
> Any way to keep the service alive all the time?

Have you informed Solr about all three of the ZK hosts?  You need a
zkHost like this, with an optional chroot:


zkHost="server1.example.com:2181,server2.example.com:2181,server3.example.com:2181/chroot"

There are more zkHost examples and a better description of the string
format in this javadoc:


https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#CloudSolrClient-java.lang.String-

If Solr hasn't been explicitly informed about all the hosts in the
ensemble, then it cannot connect to surviving hosts.

I've never heard of the problem you described happening as long as
Zookeeper quorum is maintained and Solr is properly configured.

If you can show that a correctly configured Solr 6.6 server loses
connection to ZK when one of the ZK servers is taken down, that's a bug,
and we need an issue in Jira with documentation of the problem.

Thanks,
Shawn

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.




Re: Live update the zookeeper for SOLR

2017-06-16 Thread Xie, Sean
Solr is configured with the zookeeper ensemble as mentioned below.

I will provide logs in a later time.

From: Shawn Heisey <apa...@elyograg.org<mailto:apa...@elyograg.org>>
Date: Friday, Jun 16, 2017, 12:27 PM
To: solr-user@lucene.apache.org 
<solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>
Subject: [EXTERNAL] Re: Live update the zookeeper for SOLR

On 6/16/2017 9:05 AM, Xie, Sean wrote:
> Is there a way to keep SOLR alive when zookeeper instances (3 instance
> ensemble) are rolling updated one at a time? It seems SOLR cluster use
> one of the zookeeper instance and when the communication is broken in
> between, it won’t be able to reconnect to another zookeeper instance
> and keep itself alive. A service restart is need in this situation.
> Any way to keep the service alive all the time?

Have you informed Solr about all three of the ZK hosts?  You need a
zkHost like this, with an optional chroot:

zkHost="server1.example.com:2181,server2.example.com:2181,server3.example.com:2181/chroot"

There are more zkHost examples and a better description of the string
format in this javadoc:

https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#CloudSolrClient-java.lang.String-

If Solr hasn't been explicitly informed about all the hosts in the
ensemble, then it cannot connect to surviving hosts.

I've never heard of the problem you described happening as long as
Zookeeper quorum is maintained and Solr is properly configured.

If you can show that a correctly configured Solr 6.6 server loses
connection to ZK when one of the ZK servers is taken down, that's a bug,
and we need an issue in Jira with documentation of the problem.

Thanks,
Shawn

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Live update the zookeeper for SOLR

2017-06-16 Thread Xie, Sean
Is there a way to keep SOLR alive when zookeeper instances (3 instance 
ensemble) are rolling updated one at a time? It seems SOLR cluster use one of 
the zookeeper instance and when the communication is broken in between, it 
won’t be able to reconnect to another zookeeper instance and keep itself alive. 
A service restart is need in this situation. Any way to keep the service alive 
all the time?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: How to do CDCR with basic auth?

2017-05-14 Thread Xie, Sean
Configured the JVM:
-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthConfigurer
-Dbasicauth=solr:SolrRocks

Configured the CDCR.

Started the Source cluster and
Getting the log:

.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception
java.lang.IllegalArgumentException: Credentials may not be null
at org.apache.http.util.Args.notNull(Args.java:54)
at org.apache.http.auth.AuthState.update(AuthState.java:113)
at 
org.apache.solr.client.solrj.impl.PreemptiveAuth.process(PreemptiveAuth.java:56)
at 
org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:132)
at 
org.apache.http.protocol.HttpRequestExecutor.preProcess(HttpRequestExecutor.java:166)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:485)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:146)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)


Somehow, the cdcr didn’t pickup the credentials when using the PreemptiveAuth.

Is it a bug?

Thanks
Sean



On 5/14/17, 3:09 PM, "Xie, Sean" <sean@finra.org> wrote:

So I have configured two clusters (source and target) with basic auth with 
solr:SolrRocks, but when starting the source node, log is showing it couldn’t 
read the authentication info.

I already added the –Dbasicauth=solr:SolrRocks to the JVM of the solr 
instance. Not sure where else I can configure the solr to use the auth.

When starting the CDCR, the log is:

2017-05-14 15:01:02.915 WARN  (qtp1348949648-21) [c:COL1 s:shard1 
r:core_node2 x:COL1_shard1_replica2] o.a.s.h.CdcrReplicatorManager Unable to 
instantiate the log reader for target collection COL1
org.apache.solr.client.solrj.SolrServerException: 
java.lang.IllegalArgumentException: Credentials may not be null
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:473)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1376)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1127)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1057)
at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.handler.CdcrReplicatorManager.getCheckpoint(CdcrReplicatorManager.java:196)
at 
org.apache.solr.handler.CdcrReplicatorManager.initLogReaders(CdcrReplicatorManager.java:159)
at 
org.apache.solr.handler.CdcrReplicatorManager.stateUpdate(CdcrReplicatorManager.java:134)
at 
org.apache.solr.handler.CdcrStateManager.callback(CdcrStateManager.java:36)
at 
org.apache.solr.handler.CdcrProcessStateManager.setState(CdcrProcessStateManager.java:93)
at 
org.apache.solr.handler.CdcrRequestHandler.handleStartAction(CdcrRequestHandler.java:352)
at 
org.apache.solr.handler.CdcrRequestHandler.handleRequestBody(CdcrRequestHandler.java:178)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at 
org.apache.solr.servlet.H

How to do CDCR with basic auth?

2017-05-14 Thread Xie, Sean
So I have configured two clusters (source and target) with basic auth with 
solr:SolrRocks, but when starting the source node, log is showing it couldn’t 
read the authentication info.

I already added the –Dbasicauth=solr:SolrRocks to the JVM of the solr instance. 
Not sure where else I can configure the solr to use the auth.

When starting the CDCR, the log is:

2017-05-14 15:01:02.915 WARN  (qtp1348949648-21) [c:COL1 s:shard1 r:core_node2 
x:COL1_shard1_replica2] o.a.s.h.CdcrReplicatorManager Unable to instantiate the 
log reader for target collection COL1
org.apache.solr.client.solrj.SolrServerException: 
java.lang.IllegalArgumentException: Credentials may not be null
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:473)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1376)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1127)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1057)
at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.handler.CdcrReplicatorManager.getCheckpoint(CdcrReplicatorManager.java:196)
at 
org.apache.solr.handler.CdcrReplicatorManager.initLogReaders(CdcrReplicatorManager.java:159)
at 
org.apache.solr.handler.CdcrReplicatorManager.stateUpdate(CdcrReplicatorManager.java:134)
at 
org.apache.solr.handler.CdcrStateManager.callback(CdcrStateManager.java:36)
at 
org.apache.solr.handler.CdcrProcessStateManager.setState(CdcrProcessStateManager.java:93)
at 
org.apache.solr.handler.CdcrRequestHandler.handleStartAction(CdcrRequestHandler.java:352)
at 
org.apache.solr.handler.CdcrRequestHandler.handleRequestBody(CdcrRequestHandler.java:178)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:202)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 

Re: CDCR with SSL enabled

2017-05-02 Thread Xie, Sean
From the QUEUE action, the output is:


0
0


34741356
2
stopped




On 5/2/17, 1:43 AM, "Xie, Sean" <sean@finra.org> wrote:

Does CDCR support SSL encrypted SolrCloud?

I have two clusters started with SSL, and CDCR setup instruction is 
followed on source and target. However, from the solr.log, I’m not able to see 
CDCR is occurring. Not sure what has been setup incorrectly.

From the solr.log, I can’t find useful info related CDCR during the 
indexing time. Any help on how to probe the issue is appreciated.

The Target config:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  
  ${solr.ulog.dir:}
  

  

The Source config:
  

  zk_ip:2181
  SourceCollection 
  TargetCollection 


  8
  1000
  128


  1000

  

  

  
  ${solr.ulog.dir:}
  
  


Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.




CDCR with SSL enabled

2017-05-01 Thread Xie, Sean
Does CDCR support SSL encrypted SolrCloud?

I have two clusters started with SSL, and CDCR setup instruction is followed on 
source and target. However, from the solr.log, I’m not able to see CDCR is 
occurring. Not sure what has been setup incorrectly.

From the solr.log, I can’t find useful info related CDCR during the indexing 
time. Any help on how to probe the issue is appreciated.

The Target config:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  
  ${solr.ulog.dir:}
  

  

The Source config:
  

  zk_ip:2181
  SourceCollection 
  TargetCollection 


  8
  1000
  128


  1000

  

  

  
  ${solr.ulog.dir:}
  
  


Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


clusterstate.json not updated in zookeeper after creating the collection using API

2017-04-06 Thread Xie, Sean
Hi

I created collection in a SolrCloud cluster (6.3.0), but found the 
clusterstate.json is not updated in zookeeper. It’s empty.

I’m able to query cluster state using API: 
admin/collections?action=CLUSTERSTATUS=json

Any reason why clusterstate.json is not updated?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


solr lost connection to zookeeper

2017-03-23 Thread Xie, Sean
When Solr lost the connection to Zookeeper, is there any way to have Solr 
reconnect to it after zookeeper is back online?  Does Solr must be restarted to 
re-initiate the connection?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: creating collection using collection API with SSL enabled SolrCloud

2017-02-09 Thread Xie, Sean
Thank you Hrishikesh,

The cluster property solved the issue.

Now we need to figure out a way to give the instance a host name to solve the 
SSL error that IP not matching the SSL name.

Sean



On 2/9/17, 11:35 AM, "Hrishikesh Gadre" <gadre.s...@gmail.com> wrote:

Hi Sean,

Have you configured the "urlScheme" cluster property (i.e. urlScheme=https)
?

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CLUSTERPROP:ClusterProperties

Thanks
Hrishikesh



On Thu, Feb 9, 2017 at 8:23 AM, Xie, Sean <sean@finra.org> wrote:

> Hi All,
>
> When trying to create the collection using the API when there are a few
> replicas, I’m getting error because the call seems to trying to use HTTP
> for the replicas.
>
> https://IP_1:8983/solr/admin/collections?action=CREATE;
> name=My_COLLECTION=1=1&
> collection.configName=my_collection_conf
>
> Here is the error:
>
> org.apache.solr.client.solrj.SolrServerException:IOException occured when
> talking to server at: http://IP_2:8983/solr
>
>
> Is there something need to be configured for that?
>
> Thanks
> Sean
>
> Confidentiality Notice::  This email, including attachments, may include
> non-public, proprietary, confidential or legally privileged information.
> If you are not an intended recipient or an authorized agent of an intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of the information contained in or transmitted with this e-mail is
> unauthorized and strictly prohibited.  If you have received this email in
> error, please notify the sender by replying to this message and 
permanently
> delete this e-mail, its attachments, and any copies of it immediately.  
You
> should not retain, copy or use this e-mail or any attachment for any
> purpose, nor disclose all or any part of the contents to any other person.
> Thank you.
>




creating collection using collection API with SSL enabled SolrCloud

2017-02-09 Thread Xie, Sean
Hi All,

When trying to create the collection using the API when there are a few 
replicas, I’m getting error because the call seems to trying to use HTTP for 
the replicas.

https://IP_1:8983/solr/admin/collections?action=CREATE=My_COLLECTION=1=1=my_collection_conf

Here is the error:

org.apache.solr.client.solrj.SolrServerException:IOException occured when 
talking to server at: http://IP_2:8983/solr


Is there something need to be configured for that?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Xie, Sean
Thanks for pointing out the java.lang.Character. I did find the existence of 
org.apache.lucene.analysis.CharacterUtils, but I was not able to find the 
needed methods in it.

Sean


On 12/15/16, 8:58 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:

On 12/15/2016 6:20 PM, Xie, Sean wrote:
> We have implemented some customized filter/tokenizer, that is using
> org.apache.lucene.analysis.util.CharacterUtils. After upgrading to
> Solr 6.3, the class is no longer available. Is there any reason the
> utility class is removed? 

This is not really a good question for this list.  It would be more at
home on the java-user mailing list for Lucene.

With a little bit of research, I was able to determine that this class
moved.  It is now:

org.apache.lucene.analysis.CharacterUtils

Some of the functionality that used to be in the old CharacterUtils is
available in java.lang.Character -- part of Java itself.

Thanks,
Shawn




Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


CharacterUtils is removed from lucene-analyzers-common >6.1

2016-12-15 Thread Xie, Sean
Dear user group,

We have implemented some customized filter/tokenizer, that is using 
org.apache.lucene.analysis.util.CharacterUtils. After upgrading to Solr 6.3, 
the class is no longer available. Is there any reason the utility class is 
removed?

What I had to do is copy the class implementation our class lib as a 
workaround. Any other way to deal with it?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.