Re: Node Recovery Questions

2018-09-14 Thread sean mcevoy
Hi Martin, List,

Just an update to let ye know how things went and what we learned.

We did the force-replace procedure to bring the new node into the cluster
in place of the old one. I attached to the riak erlang shell and with a
little hacking was able to get all the bitcask handles and then do a
bitcask:fold/3 to count keys. This showed that only a small percentage of
all keys were present on the new node, even after the handoffs and
transfers had completed.

Following the instructions at the bottom of this page:
https://docs.basho.com/riak/kv/2.2.0/using/repair-recovery/repairs/
I attached to the erlang shell again and ran these commands (replacing the
IP with our actual IP) to force repairs on all vnodes:

{ok, Ring} = riak_core_ring_manager:get_my_ring().
Partitions = [P || {P, 'dev1@127.0.0.1'} <-
riak_core_ring:all_owners(Ring)].
[riak_kv_vnode:repair(P) || P <- Partitions].

The progress was most easily monitored with: riak-admin handoff summary
and once complete the new node had the expected number of keys.

Counting the keys is more than a bit hacky and occasionally caused a seg
fault if there was background traffic, so I don't recommend it in general.
But it did allow us to verify where the data was in our test env and then
we could trust the procedure without counting keys in production.
Monitoring the size of the bitcask directory is a lot lower resolution but
it is at least safe, the results were similar in test & production so it
was sufficient to verify the above procedure.

So in short, when replacing a node the force-replace procedure doesn't
actually cause data to be synched to the new node. The above erlang shell
commands do force a sync.

Thanks for the support!
//Sean.

On Thu, Aug 9, 2018 at 11:25 PM sean mcevoy  wrote:

> Hi Martin,
> Thanks for taking the time.
> Yes, by "size of the bitcask directory" I mean I did a "du -h
> --max-depth=1 bitcask", so I think that would cover all the vnodes. We
> don't use any other backends.
> Those answers are helpful, will get back to this in a few days and see
> what I can determine about where our data physically lies. Might have more
> questions then.
> Cheers,
> //Sean.
>
> On Wed, Aug 8, 2018 at 6:05 PM, Martin Sumner  > wrote:
>
>> Based on a quick read of the code, compaction in bitcask is performed
>> only on "readable" files, and the current active file for writing is
>> excluded from that list.  With default settings, that active file can grow
>> to 2GB.  So it is possible that if objects had been replaced/deleted many
>> times within the active file, that space will not be recovered if all the
>> replacements amount to < 2GB per vnode.  So at these small data sizes - you
>> may get a relatively significant discrepancy between an old and recovered
>> node in terms of disk space usage.
>>
>> On 8 August 2018 at 17:37, Martin Sumner 
>> wrote:
>>
>>> Sean,
>>>
>>> Some partial answers to your questions.
>>>
>>> I don't believe force-replace itself will sync anything up - it just
>>> reassigns ownership (hence handoff happens very quickly).
>>>
>>> Read repair would synchronise a portion of the data.  So if 10% of you
>>> data is read regularly, this might explain some of what you see.
>>>
>>> AAE should also repair your data.  But if nothing has happened for 4
>>> days, then that doesn't seem to be the case.  It would be worth checking
>>> the aae-status page (
>>> http://docs.basho.com/riak/kv/2.2.3/using/admin/riak-admin/#aae-status)
>>> to confirm things are happening.
>>>
>>> I don't know if there are any minimum levels of data before bitcask will
>>> perform compaction.  There's nothing obvious in the code that wouldn't be
>>> triggered way before 90%.  I don't know if it will merge on the active file
>>> (the one currently being written to), but that is 2GB max size (configured
>>> through bitcask.max_file_size).
>>>
>>> When you say the size of the bitcask directory - is this the size shared
>>> across all vnodes on the node?  I guess if each vnode has a single file
>>> <2GB, and there are multiple vnodes - something unexpected might happen
>>> here?  If bitcask does indeed not merge the file active for writing.
>>>
>>> In terms of distribution around the cluster, if you have an n_val of 3
>>> you should normally expect to see a relatively even distribution of the
>>> data on failure (certainly not it all going to one).  Worst case scenario
>>> is that 3 nodes get all the load from that one failed node.
>>>
>>> When a vnode is inaccessible, 3 (assuming n=3) fallback vnodes are

Re: Node Recovery Questions

2018-08-09 Thread sean mcevoy
Hi Martin,
Thanks for taking the time.
Yes, by "size of the bitcask directory" I mean I did a "du -h --max-depth=1
bitcask", so I think that would cover all the vnodes. We don't use any
other backends.
Those answers are helpful, will get back to this in a few days and see what
I can determine about where our data physically lies. Might have more
questions then.
Cheers,
//Sean.

On Wed, Aug 8, 2018 at 6:05 PM, Martin Sumner 
wrote:

> Based on a quick read of the code, compaction in bitcask is performed only
> on "readable" files, and the current active file for writing is excluded
> from that list.  With default settings, that active file can grow to 2GB.
> So it is possible that if objects had been replaced/deleted many times
> within the active file, that space will not be recovered if all the
> replacements amount to < 2GB per vnode.  So at these small data sizes - you
> may get a relatively significant discrepancy between an old and recovered
> node in terms of disk space usage.
>
> On 8 August 2018 at 17:37, Martin Sumner 
> wrote:
>
>> Sean,
>>
>> Some partial answers to your questions.
>>
>> I don't believe force-replace itself will sync anything up - it just
>> reassigns ownership (hence handoff happens very quickly).
>>
>> Read repair would synchronise a portion of the data.  So if 10% of you
>> data is read regularly, this might explain some of what you see.
>>
>> AAE should also repair your data.  But if nothing has happened for 4
>> days, then that doesn't seem to be the case.  It would be worth checking
>> the aae-status page (http://docs.basho.com/riak/kv
>> /2.2.3/using/admin/riak-admin/#aae-status) to confirm things are
>> happening.
>>
>> I don't know if there are any minimum levels of data before bitcask will
>> perform compaction.  There's nothing obvious in the code that wouldn't be
>> triggered way before 90%.  I don't know if it will merge on the active file
>> (the one currently being written to), but that is 2GB max size (configured
>> through bitcask.max_file_size).
>>
>> When you say the size of the bitcask directory - is this the size shared
>> across all vnodes on the node?  I guess if each vnode has a single file
>> <2GB, and there are multiple vnodes - something unexpected might happen
>> here?  If bitcask does indeed not merge the file active for writing.
>>
>> In terms of distribution around the cluster, if you have an n_val of 3
>> you should normally expect to see a relatively even distribution of the
>> data on failure (certainly not it all going to one).  Worst case scenario
>> is that 3 nodes get all the load from that one failed node.
>>
>> When a vnode is inaccessible, 3 (assuming n=3) fallback vnodes are
>> selected to handle the load for that 1 vnode (as that vnode would normally
>> be in 3 preflists, and commonly a different node will be asked to start a
>> vnode for each preflist).
>>
>>
>> I will try and dig later into bitcask merge/compaction code, to see if I
>> spot anything else.
>>
>> Martin
>>
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Node Recovery Questions

2018-08-08 Thread sean mcevoy
Hi All,

A few questions on the procedure here to recover a failed node:
http://docs.basho.com/riak/kv/2.2.3/using/repair-recovery/failed-node/

We lost a production riak server when AWS decided to delete a node and we
plan on doing this procedure to replace it with a newly built node. A
practice run in our QA environment has brought up some questions.

- How can I tell when everything has synched up? I thought I could just
monitor the handoffs but these completed within 5 minutes of comitting the
cluster changes, the data directories continued to grow rapidly in size for
at least an hour. I assume that this was data being synched to the new node
but how can I tell when it has completed from the user level? Or is it left
up to AAE to sync the data?

- The size of the bitcask directory on the 4 original nodes is ~10GB, on
the new node the size of this directory climbed to 1GB within an hour but
hasn't moved much in the 4 days since. I know bitcask entries still exist
until the periodic compaction but can it be right that its hanging on to
90% the disk space its using for dead data?

- Not directly related to the recovery procedure, but while one node of a
five-node cluster is down how is the extra load distributed within the
cluster? It will still keep 3 copies of each entry, right? Are the copies
that would have been on the missing node all stored on the next node in the
ring, or distributed all around the cluster?

Thanks in advance,
//Sean.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Solr search response time spikes

2017-07-03 Thread sean mcevoy
Hi List, Fred,

After a week of going cross-eyed looking at stats & trying to engineer a
test case to make this happen in the test env I think I've made a
breakthrough.

We have a low but steady level of riak traffic but our application level
actions that result in solr reads are actually fairly infrequent. And when
one of these actions occur it results in multiple parallel reads to our
solr indexes.

What I've observed is that our timeouts are most easily reproduced after a
period of inactivity. And once I see a timeout after 2 seconds I kick off
multiple other reads to random keys and observe that some return instantly
while others can take several seconds, but then all return at the same time.

It's almost as if some shards in the java VM have gone to sleep due to
inactivity and we see a cluster of timeouts when we try to read from it.

I'm setting up a "pinger" script in our prod env to keep these awake and
see if our observed timeout rate reduces.

If this is actually our problem are there any JVM config options we can use
to keep the index active all the time?

//Sean.

On Fri, Jun 23, 2017 at 1:48 PM, sean mcevoy <sean.mce...@gmail.com> wrote:

> Hi Fred,
>
> Thanks for taking the time!
> Yes, I noticed that unbalance yesterday when writing, looked into it after
> sending and found our config is corrupt with one node ommitted and another
> in there twice.
> But, with such low traffic levels and the spikes being on the non-favoured
> node I'm not currently ranking that as a likely factor.
>
>
> Another interesting case from last night, this sample was taken at
> 2017-6-23 06:04:09
>
> Riak node 1
> "search_query_throughput_one": 27
> "search_query_latency_max": 10417
>
> Riak node 2
> "search_query_throughput_one": 49
> "search_query_latency_max": 8952
>
> Riak node 3
> "search_query_throughput_one": 18
> "search_query_throughput_count": 2507
> "search_query_latency_min": 1757
> "search_query_latency_median": 14775
> "search_query_latency_mean": 5628361
> "search_query_latency_max": 18298854
> "search_query_latency_999": 18298854
> "search_query_latency_99": 18298854
> "search_query_latency_95": 16539782
>
> Riak node 4
> "search_query_throughput_one": 25
> "search_query_latency_max": 10217
>
>
> Brushing up my maths and focussing on node 3, from the 99 & 95% figures we
> can tell the 2 slowest response times were 18,298 & 16,539ms, 34,837 ms in
> total.
> And from the request count for the minute & the mean we can tell that in
> total these 18 requests spent a total of 101,310 ms being processed.
> From the median & min we know the 9 quickest took between 18 & 265 ms in
> total.
> This leaves in the region of 66 sec for the other 7 requests, enough for
> all 7 to have timed out.
>
>
> Cross referencing with our application logs I can see:
>
> On application node 1 at 2017-06-23 06:03:17 we had 3 search request
> timeouts to index A with 3 different filters, one field of which, lets call
> it field X, had the same value.
> We immediately retried these and at 2017-06-23 06:03:19 2 of those timed
> out again and were retried again.
> They all succeeded on this retry, so this suggests that the same requests
> sent to other riak nodes was fine, but to this riak node at this time was a
> problem.
>
> On application node 2 at:
> 2017-06-23 06:03:27
> 2017-06-23 06:03:29
> 2017-06-23 06:03:31
> 2017-06-23 06:03:33
>
> we had 4 more timeouts on search requests to index A, these requests had 2
> different filters but in both cases field X had the same value as in the
> previous example.
>
>
> So these application logs show 9 riak timeouts, that must correlate with
> the riak stats.
> I can't definitively say that no other search requests went to this riak
> node between 06:03:15 & 06:03:33 but the circumstantial evidence is that it
> had a problem for 18 seconds, which is quiet a big window.
>
>
> The index that all these requests were directed at currently has 490K
> entries with 8 different fields defined in each. The corresponding riak
> bucket has allow_mult = false, if that's relevant.
>
> We see a similar pattern on our test system, I'm going to setup a test to
> repeatedly do searches and see if I can trigger this consistently. Will let
> ye know if anything interesting comes out of it.
>
> I know it's relatively new to the product, do we know is riak solr used
> much in production systems?
> I assume no one else has seen these spikes?
>
> //Sean.
>
>
> On Thu, Jun 22, 2017 at 9:40 PM, Fred Dushin <f...@dushin.net> wro

Re: Riak Intermittent Read Failures

2017-06-26 Thread sean mcevoy
Hi Mark,

I've observed timeouts too but always on serach operation, you might have
seen my thread "Solr search response time spikes".

I'm getting stats by polling this every minute:
http://docs.basho.com/riak/kv/2.2.3/developing/api/http/status/

The 99 & 100% response times are most interesting for debugging our
problems.
What client & timeout value are you using? I'm using the erlang client
where the default timeout is 60 seconds, but I've over ridden that and am
using 2 seconds.

Interestingly, over the weekend I've started to see a few put & get
timeouts on the application side, but the longest 100% response time is
just under a second which points to a network delay.

I'd start by polling these stats and then examining when you get an
application side timeout. Maybe check the size stats too, if you can catch
which key the operation timed out on it'd be worth checking the object size
& sibling count for it. If nothing else this would eliminate the
possibility that it's unique to a particular object.

//Sean.


On Sat, Jun 24, 2017 at 12:57 PM, markrthomas 
wrote:

> Hello
>
> I'm getting intermiitent read failures in my cluster, i.e. timeout
>
> Sometimes an object returns immediately.
>
> Other times, nothing at all and I get a read-timeout.
>
> Any ideas on where I start debugging this issue?
>
> Thanks
>
> Mark
>
>
>
> --
> View this message in context: http://riak-users.197444.n3.
> nabble.com/Riak-Intermittent-Read-Failures-tp4035229.html
> Sent from the Riak Users mailing list archive at Nabble.com.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Solr search response time spikes

2017-06-23 Thread sean mcevoy
Hi Fred,

Thanks for taking the time!
Yes, I noticed that unbalance yesterday when writing, looked into it after
sending and found our config is corrupt with one node ommitted and another
in there twice.
But, with such low traffic levels and the spikes being on the non-favoured
node I'm not currently ranking that as a likely factor.


Another interesting case from last night, this sample was taken at
2017-6-23 06:04:09

Riak node 1
"search_query_throughput_one": 27
"search_query_latency_max": 10417

Riak node 2
"search_query_throughput_one": 49
"search_query_latency_max": 8952

Riak node 3
"search_query_throughput_one": 18
"search_query_throughput_count": 2507
"search_query_latency_min": 1757
"search_query_latency_median": 14775
"search_query_latency_mean": 5628361
"search_query_latency_max": 18298854
"search_query_latency_999": 18298854
"search_query_latency_99": 18298854
"search_query_latency_95": 16539782

Riak node 4
"search_query_throughput_one": 25
"search_query_latency_max": 10217


Brushing up my maths and focussing on node 3, from the 99 & 95% figures we
can tell the 2 slowest response times were 18,298 & 16,539ms, 34,837 ms in
total.
And from the request count for the minute & the mean we can tell that in
total these 18 requests spent a total of 101,310 ms being processed.
>From the median & min we know the 9 quickest took between 18 & 265 ms in
total.
This leaves in the region of 66 sec for the other 7 requests, enough for
all 7 to have timed out.


Cross referencing with our application logs I can see:

On application node 1 at 2017-06-23 06:03:17 we had 3 search request
timeouts to index A with 3 different filters, one field of which, lets call
it field X, had the same value.
We immediately retried these and at 2017-06-23 06:03:19 2 of those timed
out again and were retried again.
They all succeeded on this retry, so this suggests that the same requests
sent to other riak nodes was fine, but to this riak node at this time was a
problem.

On application node 2 at:
2017-06-23 06:03:27
2017-06-23 06:03:29
2017-06-23 06:03:31
2017-06-23 06:03:33

we had 4 more timeouts on search requests to index A, these requests had 2
different filters but in both cases field X had the same value as in the
previous example.


So these application logs show 9 riak timeouts, that must correlate with
the riak stats.
I can't definitively say that no other search requests went to this riak
node between 06:03:15 & 06:03:33 but the circumstantial evidence is that it
had a problem for 18 seconds, which is quiet a big window.


The index that all these requests were directed at currently has 490K
entries with 8 different fields defined in each. The corresponding riak
bucket has allow_mult = false, if that's relevant.

We see a similar pattern on our test system, I'm going to setup a test to
repeatedly do searches and see if I can trigger this consistently. Will let
ye know if anything interesting comes out of it.

I know it's relatively new to the product, do we know is riak solr used
much in production systems?
I assume no one else has seen these spikes?

//Sean.


On Thu, Jun 22, 2017 at 9:40 PM, Fred Dushin <f...@dushin.net> wrote:

> It's pretty strange that you are seeing no search latency measurements on
> node 5.  Are you sure your round robining is working?  Are you favoring
> node 1?
>
> In general, I don't think which node you hit for query should make a
> difference, but I'd have to stare at the code some to be sure.  In essence,
> all the node that services the query does is convert the query into a
> sharded Solr query based on a coverage plan, which changes every minute or
> so, and then runs the sharded query on the local Solr node.  The Solr node
> then distributes the query to the rest of the nodes in the cluster, but
> that's all Solr comms -- Riak is out of the picture, by then.
>
> Now, if you have a lot of sharded queries accumulating on one node, that
> might make a difference to Solr.  I am not a Solr expert, and I don't even
> play one on TV.  But maybe the fact that you are not hitting node 5 is
> relevant for that reason?
>
> Can you do more analysis on your client, to make sure you are not favoring
> node 1?
>
> -Fred
>
> > On Jun 22, 2017, at 10:20 AM, sean mcevoy <sean.mce...@gmail.com> wrote:
> >
> > Hi List,
> >
> > We have a standard riak cluster with 5 nodes and at the minute the
> traffic levels are fairly low. Each of our application nodes has 25 client
> connections, 5 to each riak node which get selected in a round robin.
> >
> > Our application level requests involve multiple riak requests so our
> traffic tends to make requests in small bursts. Everything works fine for
> KV get

Solr search response time spikes

2017-06-22 Thread sean mcevoy
Hi List,

We have a standard riak cluster with 5 nodes and at the minute the traffic
levels are fairly low. Each of our application nodes has 25 client
connections, 5 to each riak node which get selected in a round robin.

Our application level requests involve multiple riak requests so our
traffic tends to make requests in small bursts. Everything works fine for
KV gets, puts & deletes but we're seeing timeouts & weird response time
spikes on solr search operations.

In the past 36 hours (the only period I have riak stats for) I see one
response time of 38.8 seconds, 3 hours earlier a response time of 20.8
seconds, and the third biggest spike is an acceptable 3.5 seconds.

See below all search_query stats for the minute of the 38 sec sample. In
the application request we made 5 riak search requests to the same index in
parallel, which happens for each request of this type and normally doesn't
have an issue. But in this case all 5 timed out, and one timed out again on
retry with the other 4 succeeding.

Anyone ever seen anything like this before? Is there any known deadlock in
solr that I might hit if I make the same request on another connection
before the first has completed? This is what we do when our riak client
times out after 2 seconds and immediately retries.

Any advice or pointers welcomed.
Thanks,
//Sean.


Riak node 1
search_query_throughput_one: 14
search_query_throughput_count: 259
search_query_latency_min: 2776
search_query_latency_median: 69411
search_query_latency_mean: 4900973
search_query_latency_max: 38887902
search_query_latency_999: 38887902
search_query_latency_99: 38887902
search_query_latency_95: 2046215
search_query_fail_one: 0
search_query_fail_count: 0

Riak node 2
search_query_throughput_one: 22
search_query_throughput_count: 564
search_query_latency_min: 4006
search_query_latency_median: 8800
search_query_latency_mean: 11834
search_query_latency_max: 25509
search_query_latency_999: 25509
search_query_latency_99: 25509
search_query_latency_95: 24035
search_query_fail_one: 0
search_query_fail_count: 0

Riak node 3
search_query_throughput_one: 6
search_query_throughput_count: 298
search_query_latency_min: 3200
search_query_latency_median: 15391
search_query_latency_mean: 18062
search_query_latency_max: 31759
search_query_latency_999: 31759
search_query_latency_99: 31759
search_query_latency_95: 31759
search_query_fail_one: 0
search_query_fail_count: 0

Riak node 4
search_query_throughput_one: 8
search_query_throughput_count: 334
search_query_latency_min: 2404
search_query_latency_median: 7230
search_query_latency_mean: 10211
search_query_latency_max: 22502
search_query_latency_999: 22502
search_query_latency_99: 22502
search_query_latency_95: 22502
search_query_fail_one: 0
search_query_fail_count: 0

Riak node 5
search_query_throughput_one: 0
search_query_throughput_count: 0
search_query_latency_min: 0
search_query_latency_median: 0
search_query_latency_mean: 0
search_query_latency_max: 0
search_query_latency_999: 0
search_query_latency_99: 0
search_query_latency_95: 0
search_query_fail_one: 0
search_query_fail_count: 0
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Start up problem talking to Riak

2017-02-16 Thread sean mcevoy
Hi David,
I vaguely remember the same problem from a previous setup I did, a while
ago now.
IIRC, the original configured IP gets written to disk on the initial start
and then the next start fails due to the mis-match.
Try deleting your data directory and restarting, so this will be like the
initial startup again.
//Sean.


On Thu, Feb 16, 2017 at 10:31 AM, AWS  wrote:

> OK, I took the machine down intending to write my own quick and easy key
> value db (I have done this before) but I have told my uni that I am using
> Riak so I thought that I ought to have another go.
>
> I have reinstalled Ubuntu and Riak. The computer now has an internal IP
> (192.168.1.94) rather than an external fixed IP. I started up Riak and got
> a pong. I then tried to connect with my software from  that already works
> with the Riak I have running on Amazon AWS so I know that the software
> works - just a request for a list of buckets.
>
> I got a "Connection refused error. I checked and 8098 was closed. I edited
> Riak.conf as advices from 127.0.0.0 to 0.0.0.0 but now Riak won't start
> (Riak failed to start in 15 seconds).
>
> I really want to get this working and, given I have an assignment due in a
> few days, sooner rather than later.  It is working fine on AWS but it is
> such a faff getting onto that using ssh as I am never sure of my key to use
> so I can't check the config there.
>
> This can't be hard, can it?
>
> Please help.
>
> David
>
>
>
> - Original Message -
> *From:* "Alex Moore" 
> *To:* "Alexander Sicular" 
> *Cc:* "AWS" , "riak-users@lists.basho.com" <
> riak-users@lists.basho.com>
> *Subject:* Re: Start up problem talking to Riak
> *Date:* 02/13/2017 15:45:28 (Mon)
>
> Yeah, what Alex said.  You can't see it with your application because it's
> currently bound to the localhost loopback address, but
> it's bad to just expose everything publicly.
>
> 1. Where is this cluster running? (AWS or local dev cluster?)
> 2. What are you trying to connect to Riak with? Is it one of our clients
> or just raw HTTP requests?
>
> Thanks,
> Alex
>
> On Mon, Feb 13, 2017 at 10:33 AM, Alexander Sicular 
> wrote:
>
>> Please don't do that. Don't point the internet at your database. Have
>> them communicate amongst each other on internal ips and route the public
>> through a proxy / middleware.
>>
>> -Alexander
>>
>> @siculars
>> http://siculars.posthaven.com
>>
>> Sent from my iRotaryPhone
>>
>> > On Feb 13, 2017, at 04:00, AWS  wrote:
>> >
>> >  I know that this isn't directly a Riak issue but I am sure that some
>> of you have met this before and can maybe help me. I am used to Macs and
>> Windows but have now set up an Ubuntu 14.04LTS server on my home network. I
>> have 5 fixed IP addresses so the server has its own external address. I
>> have opened port 8098 on my router to point at the server and checked that
>> ufw isn't running. I have tested with it running ufw and with  'allow 8098'
>> applied. I still cannot connect to Riak. On the same computer I get a pong
>> back to a ping so Riak seems to be OK.
>> >
>> > I have a Riak server running on AWS and had trouble setting that up
>> until I, eventually, opened all ports.
>> >
>> > Can anyone please suggest some steps that I might take? I need this
>> running for an Open University course that I am studying. My AWS free
>> server runs out before the course finishes so I have to get this up and
>> running soon.
>> > Thanks  in advance.
>> > David
>> >
>> > Message sent using Winmail Mail Server
>> > ___
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
> --
>
> Message sent using Winmail Mail Server
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Doc typo

2016-11-15 Thread sean mcevoy
Cheers Luca, easy when you know how ;-)
PR has been made.
//Sean.


On Tue, Nov 15, 2016 at 9:31 AM, Luca Favatella <
luca.favate...@erlang-solutions.com> wrote:

> On 15 November 2016 at 09:17, sean mcevoy <sean.mce...@gmail.com> wrote:
> [...]
>
>> Hi Basho guys,
>>
>> What's your procedure on reporting documentation bugs?
>>
>>
>>
> Hi Sean,
>
> I understand the source of the docs is at https://github.com/basho/
> basho_docs and the usual pull requests workflow applies.
>
> Regards
> Luca
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Solr search performance

2016-09-21 Thread sean mcevoy
Hi Fred,

Thanks for the pointer! 'cursorMark' is a lot more performant alright,
though apparently it doesn't suit our use case.

I've written a loop function using OTP's httpc that reads each page, gets
the cursorMark and repeats, and it returns all 147 pages with consistent
times in the 40-60ms bracket which is an excellent improvement!

I would have been asking about the effort involved in making the protocol
buffers client support this, but instead our GUI guys insist that they need
to request a page number as sometimes they want to start in the middle of a
set of data.

So I'm almost back to square one.
Can you shed any light on the internal workings of SOLR that produce the
slow-down in my original question?
I'm hoping I can find a way to restructure my index data without having to
change the higher-level API's that I support.

Cheers,
//Sean.


On Mon, Sep 19, 2016 at 10:00 PM, Fred Dushin <fdus...@basho.com> wrote:

> All great questions, Sean.
>
> A few things.  First off, for result sets that are that large, you are
> probably going to want to use Solr cursor marks [1], which are supported in
> the current version of Solr we ship.  Riak allows queries using cursor
> marks through the HTTP interface.  At present, it does not support cursors
> using the protobuf API, due to some internal limitations of the server-side
> protobuf library, but we do hope to fix that in the future.
>
> Secondly, we have found sorting with distributed queries to be far more
> performant using Solr 4.10.4.  Currently released versions of Riak use Solr
> 4.7, but as you can see on github [2], Solr 4.10.4 support has been merged
> into the develop-2.2 branch, and is in the pipeline for release.  I can't
> say when the next version of Riak is that will ship with this version
> because of indeterminacy around bug triage, but it should not be too long.
>
> I would start to look at using cursor marks and measure their relative
> performance in your scenario.  My guess is that you should see some
> improvement there.
>
> -Fred
>
> [1] https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> [2] https://github.com/basho/yokozuna/commit/
> f64e19cef107d982082f5b95ed598da96fb419b0
>
>
> > On Sep 19, 2016, at 4:48 PM, sean mcevoy <sean.mce...@gmail.com> wrote:
> >
> > Hi All,
> >
> > We have an index with ~548,000 entries, ~14,000 of which match one of
> our queries.
> > We read these in a paginated search and the first page (of 100 hits)
> returns quickly in ~70ms.
> > This response time seems to increase exponentially as we walk through
> the pages:
> > the 4th page takes ~200ms,
> > the 8th page takes ~1200ms
> > the 12th page takes ~2100ms
> > the 16th page takes ~6100ms
> > the 20th page takes ~24000ms
> >
> > And by the time we're searching for the 22nd page it regularly times out
> at the default 60 seconds.
> >
> > I have a good unsderstanding of riak KV internals but absolutely nothing
> of Lucene which I think is what's most relevant here. If anyone in the know
> can point me towards any relevant resource or can explain what's happening
> I'd be much obliged :-)
> > As I would also be if anyone with experience of using Riak/Lucene can
> tell me:
> > - Is 500K a crazy number of entries to put into one index?
> > - Is 14K a crazy number of entries to expect to be returned?
> > - Are there any methods we can use to make the search time more constant
> across the full search?
> > I read one blog post on inlining but it was a bit old & not very obvious
> how to implement using riakc_pb_socket calls.
> >
> > And out of curiosity, do we not traverse the full range of hits for each
> page? I naively thought that because I'm sorting the returned values we'd
> have to get them all first and then sort, but the response times suggests
> otherwise. Does Lucene store the data sorted by each field just in case a
> query asks for it? Or what other magic is going on?
> >
> >
> > For the technical details, we use the "_yz_default" schema and all the
> fields stored are strings:
> > - entry_id_s: unique within the DB, the aim of the query is to gather a
> list of these
> > - type_s: has one of 2 values
> > - sub_category_id_s: in the query described above all 14K hits will
> match on this, in the DB of ~500K entries there are ~43K different values
> for this field, withe each category typically having 2-6 sub categories
> > - category_id_s: not matched in this query, in the DB of ~500K entries
> there are ~13K different values for this field
> > - status_s: has one of 2 values, in the query described baove all hits
> will have the value "active"
> > - us

Solr search performance

2016-09-19 Thread sean mcevoy
Hi All,

We have an index with ~548,000 entries, ~14,000 of which match one of our
queries.
We read these in a paginated search and the first page (of 100 hits)
returns quickly in ~70ms.
This response time seems to increase exponentially as we walk through the
pages:
the 4th page takes ~200ms,
the 8th page takes ~1200ms
the 12th page takes ~2100ms
the 16th page takes ~6100ms
the 20th page takes ~24000ms

And by the time we're searching for the 22nd page it regularly times out at
the default 60 seconds.

I have a good unsderstanding of riak KV internals but absolutely nothing of
Lucene which I think is what's most relevant here. If anyone in the know
can point me towards any relevant resource or can explain what's happening
I'd be much obliged :-)
As I would also be if anyone with experience of using Riak/Lucene can tell
me:
- Is 500K a crazy number of entries to put into one index?
- Is 14K a crazy number of entries to expect to be returned?
- Are there any methods we can use to make the search time more constant
across the full search?
I read one blog post on inlining but it was a bit old & not very obvious
how to implement using riakc_pb_socket calls.

And out of curiosity, do we not traverse the full range of hits for each
page? I naively thought that because I'm sorting the returned values we'd
have to get them all first and then sort, but the response times suggests
otherwise. Does Lucene store the data sorted by each field just in case a
query asks for it? Or what other magic is going on?


For the technical details, we use the "_yz_default" schema and all the
fields stored are strings:
- entry_id_s: unique within the DB, the aim of the query is to gather a
list of these
- type_s: has one of 2 values
- sub_category_id_s: in the query described above all 14K hits will match
on this, in the DB of ~500K entries there are ~43K different values for
this field, withe each category typically having 2-6 sub categories
- category_id_s: not matched in this query, in the DB of ~500K entries
there are ~13K different values for this field
- status_s: has one of 2 values, in the query described baove all hits will
have the value "active"
- user_id_s: unique within the DB but not matched in this query
- first_name_s: almost unique within the DB, this query will sort by this
field
- last_name_s: almost unique within the DB, this query will sort by this
field

This search query looks like:
<<"sub_category_id_s:test_1 AND status_s:active AND type_s:sub_category">>

Our options parameter has the sort directive:
{sort, <<"first_name_s asc, last_name_s asc">>}

The query was run on a 5-node cluster with n_val of 3.

Thanks in advance fo rany pointers!
//Sean.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Spaces in the search string

2016-09-08 Thread sean mcevoy
Hi Alexander,
Unfortunately it didn't shake with any satisfaction.
I'm sure there's an easy answer, and I hope I'll get back to search for it
some day.
But for now me & my pragmatic overlords have gone for a work-around
solution that avoids the problem.
//Sean.


On Wed, Sep 7, 2016 at 2:06 PM, Alexander Sicular <sicul...@basho.com>
wrote:

> Hi Sean, Familiarize yourself with the default schema[0], if that is what
> you're using. Also check details around this specific type of search around
> the web[1].
>
> Let us know how it shakes out,
> -Alexander
>
>
> [0] https://raw.githubusercontent.com/basho/yokozuna/develop/priv/default_
> schema.xml
> [1] http://stackoverflow.com/questions/10023133/solr-
> wildcard-query-with-whitespace
>
>
>
> On Wednesday, September 7, 2016, sean mcevoy <sean.mce...@gmail.com>
> wrote:
>
>> Hi again!
>>
>> Apologies for the premature post earlier. I thought I had a solution when
>> I didn't get the error but when I got around to plugging it into my
>> application it's still not doing everything that I need.
>> I've narrowed it down to this minimal testcase, first setup the index &
>> insert the data:
>>
>>
>> {ok,Pid} = riakc_pb_socket:start("127.0.0.1", 10017).
>> ok = riakc_pb_socket:create_search_index(Pid, <<"test_index">>,
>> <<"_yz_default">>, []).
>> ok = riakc_pb_socket:set_search_index(Pid, <<"test_bucket">>,
>> <<"test_index">>).
>> RO = riakc_obj:new(<<"test_bucket">>, <<"test_key">>,
>> <<"{\"name_s\":\"my test name\",\"age_i\":2}">>, "application/json").
>> ok = riakc_pb_socket:put(Pid, RO).
>>
>>
>> Now I can get the hit when search for a partial name with wildcards & no
>> escapes or spaces:
>> 521>
>> 521> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:*test* AND
>> age_i:2">>, []).
>> {ok,{search_results,[{<<"test_index">>,
>>   [{<<"score">>,<<"1.227760798549e+00">>},
>>{<<"_yz_rb">>,<<"test_bucket">>},
>>{<<"_yz_rt">>,<<"default">>},
>>{<<"_yz_rk">>,<<"test_key">>},
>>{<<"_yz_id">>,<<"1*default*tes
>> t_bucket*test_key*57">>},
>>{<<"name_s">>,<<"my test name">>},
>>{<<"age_i">>,<<"2">>}]}],
>> 1.2277607917785645,1}}
>>
>>
>> And I can get the hit when I search for the full name with spaces & the
>> escaped quotes:
>> 522>
>> 522> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:\"my test
>> name\" AND age_i:2">>, []).
>> {ok,{search_results,[{<<"test_index">>,
>>   [{<<"score">>,<<"1.007369608719e+00">>},
>>{<<"_yz_rb">>,<<"test_bucket">>},
>>{<<"_yz_rt">>,<<"default">>},
>>{<<"_yz_rk">>,<<"test_key">>},
>>{<<"_yz_id">>,<<"1*default*tes
>> t_bucket*test_key*58">>},
>>{<<"name_s">>,<<"my test name">>},
>>{<<"age_i">>,<<"2">>}]}],
>> 1.0073696374893188,1}}
>>
>>
>> But how can I search for a partial name with spaces:
>> 523>
>> 523> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:\"*y test
>> na*\" AND age_i:2">>, []).
>> {ok,{search_results,[],0.0,0}}
>> 524>
>> 524>
>>
>>
>> I get the feeling that I'm missing something really obvious but can't see
>> it. Any more pointers appreciated!
>>
>> //Sean.
>>
>>
>> On Wed, Sep 7, 2016 at 10:11 AM, sean mce

Re: Spaces in the search string

2016-09-07 Thread sean mcevoy
Hi again!

Apologies for the premature post earlier. I thought I had a solution when I
didn't get the error but when I got around to plugging it into my
application it's still not doing everything that I need.
I've narrowed it down to this minimal testcase, first setup the index &
insert the data:


{ok,Pid} = riakc_pb_socket:start("127.0.0.1", 10017).
ok = riakc_pb_socket:create_search_index(Pid, <<"test_index">>,
<<"_yz_default">>, []).
ok = riakc_pb_socket:set_search_index(Pid, <<"test_bucket">>,
<<"test_index">>).
RO = riakc_obj:new(<<"test_bucket">>, <<"test_key">>, <<"{\"name_s\":\"my
test name\",\"age_i\":2}">>, "application/json").
ok = riakc_pb_socket:put(Pid, RO).


Now I can get the hit when search for a partial name with wildcards & no
escapes or spaces:
521>
521> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:*test* AND
age_i:2">>, []).
{ok,{search_results,[{<<"test_index">>,
  [{<<"score">>,<<"1.227760798549e+00">>},
   {<<"_yz_rb">>,<<"test_bucket">>},
   {<<"_yz_rt">>,<<"default">>},
   {<<"_yz_rk">>,<<"test_key">>},

{<<"_yz_id">>,<<"1*default*test_bucket*test_key*57">>},
   {<<"name_s">>,<<"my test name">>},
   {<<"age_i">>,<<"2">>}]}],
1.2277607917785645,1}}


And I can get the hit when I search for the full name with spaces & the
escaped quotes:
522>
522> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:\"my test
name\" AND age_i:2">>, []).
{ok,{search_results,[{<<"test_index">>,
  [{<<"score">>,<<"1.007369608719e+00">>},
       {<<"_yz_rb">>,<<"test_bucket">>},
   {<<"_yz_rt">>,<<"default">>},
   {<<"_yz_rk">>,<<"test_key">>},

{<<"_yz_id">>,<<"1*default*test_bucket*test_key*58">>},
   {<<"name_s">>,<<"my test name">>},
   {<<"age_i">>,<<"2">>}]}],
1.0073696374893188,1}}


But how can I search for a partial name with spaces:
523>
523> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:\"*y test
na*\" AND age_i:2">>, []).
{ok,{search_results,[],0.0,0}}
524>
524>


I get the feeling that I'm missing something really obvious but can't see
it. Any more pointers appreciated!

//Sean.


On Wed, Sep 7, 2016 at 10:11 AM, sean mcevoy <sean.mce...@gmail.com> wrote:

> Hi Jason,
>
> Thanks for the kick, I just needed to look closer!
> Yes, had tried escaping but one of my utility functions for dynamically
> building the search string had been stripping it out again. D'oh!
>
> Curiously, just escaping the space doesn't work as in the example in the
> stackoverflow post.
> Putting the search term in an inner string and escaping its quotes both
> feels more natural and does work so I'm going with something more like:
>
> 409>
> 409>
> 409> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:\"we rt\" AND
> age_i:0">>, []).
> {ok,{search_results,[],0.0,0}}
> 410>
> 410>
> 410> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:we\ rt AND
> age_i:0">>, []).
> {error,<<"Query unsuccessful check the logs.">>}
> 411>
> 411>
>
> Cheers,
> //Sean.
>
>
> On Tue, Sep 6, 2016 at 2:48 PM, Jason Voegele <jvoeg...@basho.com> wrote:
>
>> Hi Sean,
>>
>> Have you tried escaping the space in your query?
>>
>> http://stackoverflow.com/questions/10023133/solr-wildcard-
>> query-with-whitespace
>>
>>
>> On Sep 5, 2016, at 6:24 PM, sean mcevoy <sean.mce...@gmail.com> wrote:
>>
>> Hi List,
>>
>> We have a solr index where we store something like:
&g

Re: Spaces in the search string

2016-09-07 Thread sean mcevoy
Hi Jason,

Thanks for the kick, I just needed to look closer!
Yes, had tried escaping but one of my utility functions for dynamically
building the search string had been stripping it out again. D'oh!

Curiously, just escaping the space doesn't work as in the example in the
stackoverflow post.
Putting the search term in an inner string and escaping its quotes both
feels more natural and does work so I'm going with something more like:

409>
409>
409> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:\"we rt\" AND
age_i:0">>, []).
{ok,{search_results,[],0.0,0}}
410>
410>
410> riakc_pb_socket:search(Pid, <<"test_index">>, <<"name_s:we\ rt AND
age_i:0">>, []).
{error,<<"Query unsuccessful check the logs.">>}
411>
411>

Cheers,
//Sean.


On Tue, Sep 6, 2016 at 2:48 PM, Jason Voegele <jvoeg...@basho.com> wrote:

> Hi Sean,
>
> Have you tried escaping the space in your query?
>
> http://stackoverflow.com/questions/10023133/solr-
> wildcard-query-with-whitespace
>
>
> On Sep 5, 2016, at 6:24 PM, sean mcevoy <sean.mce...@gmail.com> wrote:
>
> Hi List,
>
> We have a solr index where we store something like:
> <<"{\"key_s\":\"ID\",\"body_s\":\"some test string\"}">>}],
>
> Then we try to do a riakc_pb_socket:search with the pattern:
> <<"body_s:*test str*">>
>
> The request will fail with an error message telling us to check the logs
> and in there we find:
>
> 2016-09-05 13:37:29.271 [error] <0.12067.10>@yz_pb_search:maybe_process:107
> {solr_error,{400,"http://localhost:10014/internal_solr/
> crm_db.campaign_index/select",<<"{\"error\":{\"msg\":\"no field name
> specified in query and no default specified via 'df'
> param\",\"code\":400}}\n">>}} [{yz_solr,search,3,[{file,"
> src/yz_solr.erl"},{line,284}]},{yz_pb_search,maybe_process,
> 3,[{file,"src/yz_pb_search.erl"},{line,78}]},{riak_api_
> pb_server,process_message,4,[{file,"src/riak_api_pb_server.
> erl"},{line,388}]},{riak_api_pb_server,connected,2,[{file,"
> src/riak_api_pb_server.erl"},{line,226}]},{riak_api_pb_
> server,decode_buffer,2,[{file,"src/riak_api_pb_server.erl"},
> {line,364}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{
> line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.
> erl"},{line,239}]}]
>
>
> Through experiment I've figured out that it doesn't like the space as it
> seems to think the part of the search string after that space is a new key
> to search for. Which seems fair enough.
>
> Anyone know of a work-around? Or am I formatting my request incorrectly?
>
> Thanks in advance.
> //Sean.
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Spaces in the search string

2016-09-05 Thread sean mcevoy
Hi List,

We have a solr index where we store something like:
<<"{\"key_s\":\"ID\",\"body_s\":\"some test string\"}">>}],

Then we try to do a riakc_pb_socket:search with the pattern:
<<"body_s:*test str*">>

The request will fail with an error message telling us to check the logs
and in there we find:

2016-09-05 13:37:29.271 [error] <0.12067.10>@yz_pb_search:maybe_process:107
{solr_error,{400,"
http://localhost:10014/internal_solr/crm_db.campaign_index/select;,<<"{\"error\":{\"msg\":\"no
field name specified in query and no default specified via 'df'
param\",\"code\":400}}\n">>}}
[{yz_solr,search,3,[{file,"src/yz_solr.erl"},{line,284}]},{yz_pb_search,maybe_process,3,[{file,"src/yz_pb_search.erl"},{line,78}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,388}]},{riak_api_pb_server,connected,2,[{file,"src/riak_api_pb_server.erl"},{line,226}]},{riak_api_pb_server,decode_buffer,2,[{file,"src/riak_api_pb_server.erl"},{line,364}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]


Through experiment I've figured out that it doesn't like the space as it
seems to think the part of the search string after that space is a new key
to search for. Which seems fair enough.

Anyone know of a work-around? Or am I formatting my request incorrectly?

Thanks in advance.
//Sean.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com