Re: Accumulo performance on various hardware configurations

2018-08-29 Thread guy sharon
OK, good news at last! Performance has improved by one order of magnitude.
The bad news is that I don't know why. As far as I can tell the only things
I changed were table.scan.max.memory and the performance profile on Muchos.
Both had no effect when I initially tested them and I doubt that there's a
"settling in" period for both of these so I don't know what it is. But at
least now it's clear that it's possible to get better performance. I'll try
to investigate why it happened. Thanks everyone for helping!

Regarding my use case: I'm trying to use Accumulo as a graph database.
Traversing a graph, or several graphs at once, means getting a row (vertex)
by ID, sending it to the client, deciding if it's relevant and then
retrieving the neighboring vertices. So lots of reads by ID and back and
forth between client and server. A full table scan is not exactly like that
but it was the simplest use case I could think of that looked somewhat
similar.

On Wed, Aug 29, 2018 at 10:45 PM  wrote:

> This may suggest an issue with client, either getting the data to the
> client or the client itself (although I think there are other performance
> related changes you could make). I’m curious what the end goal is here. Is
> this a real world use case? If you are using this type of benchmark to
> evaluate the speed of Accumulo, then you will likely not get the same
> performance when you apply your data and your real use cases.
>
>
>
> *From:* guy sharon 
> *Sent:* Wednesday, August 29, 2018 3:13 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: Accumulo performance on various hardware configurations
>
>
>
> hi Mike,
>
>
>
> As per Mike Miller's suggestion I started using
> org.apache.accumulo.examples.simple.helloworld.ReadData from Accumulo with
> debugging turned off and a BatchScanner with 10 threads. I redid all the
> measurements and although this was 20% faster than using the shell there
> was no difference once I started playing with the hardware configurations.
>
>
>
> Guy.
>
>
>
> On Wed, Aug 29, 2018 at 10:06 PM Michael Wall  wrote:
>
> Guy,
>
>
>
> Can you go into specifics about how you are measuring this?  Are you still
> using "bin/accumulo shell -u root -p secret -e "scan -t hellotable -np" |
> wc -l" as you mentioned earlier in the thread?  As Mike Miller suggested,
> serializing that back to the display and then counting 6M entries is going
> to take some time.  Try using a Batch Scanner directly.
>
>
>
> Mike
>
>
>
> On Wed, Aug 29, 2018 at 2:56 PM guy sharon 
> wrote:
>
> Yes, I tried the high performance configuration which translates to 4G
> heap size, but that didn't affect performance. Neither did setting
> table.scan.max.memory to 4096k (default is 512k). Even if I accept that the
> read performance here is reasonable I don't understand why none of the
> hardware configuration changes (except going to 48 cores, which made things
> worse) made any difference.
>
>
>
> On Wed, Aug 29, 2018 at 8:33 PM Mike Walch  wrote:
>
> Muchos does not automatically change its Accumulo configuration to take
> advantage of better hardware. However, it does have a performance profile
> setting in its configuration (see link below) where you can select a
> profile (or create your own) based on your the hardware you are using.
>
>
>
>
> https://github.com/apache/fluo-muchos/blob/master/conf/muchos.props.example#L94
>
> On Wed, Aug 29, 2018 at 11:35 AM Josh Elser  wrote:
>
> Does Muchos actually change the Accumulo configuration when you are
> changing the underlying hardware?
>
> On 8/29/18 8:04 AM, guy sharon wrote:
> > hi,
> >
> > Continuing my performance benchmarks, I'm still trying to figure out if
> > the results I'm getting are reasonable and why throwing more hardware at
> > the problem doesn't help. What I'm doing is a full table scan on a table
> > with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop
> > 2.8.4. The table is populated by
> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
> > modified to write 6M entries instead of 50k. Reads are performed by
> > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i
> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the
> > results I got:
> >
> > 1. 5 tserver cluster as configured by Muchos
> > (https://github.com/apache/fluo-muchos), running on m5d.large AWS
> > machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate
> > server. Scan took 12 seconds.
> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
> > 3. Splitting the table to 4 tablets causes th

Re: Accumulo performance on various hardware configurations

2018-08-29 Thread guy sharon
hi Mike,

As per Mike Miller's suggestion I started using
org.apache.accumulo.examples.simple.helloworld.ReadData from Accumulo with
debugging turned off and a BatchScanner with 10 threads. I redid all the
measurements and although this was 20% faster than using the shell there
was no difference once I started playing with the hardware configurations.

Guy.

On Wed, Aug 29, 2018 at 10:06 PM Michael Wall  wrote:

> Guy,
>
> Can you go into specifics about how you are measuring this?  Are you still
> using "bin/accumulo shell -u root -p secret -e "scan -t hellotable -np" |
> wc -l" as you mentioned earlier in the thread?  As Mike Miller suggested,
> serializing that back to the display and then counting 6M entries is going
> to take some time.  Try using a Batch Scanner directly.
>
> Mike
>
> On Wed, Aug 29, 2018 at 2:56 PM guy sharon 
> wrote:
>
>> Yes, I tried the high performance configuration which translates to 4G
>> heap size, but that didn't affect performance. Neither did setting
>> table.scan.max.memory to 4096k (default is 512k). Even if I accept that the
>> read performance here is reasonable I don't understand why none of the
>> hardware configuration changes (except going to 48 cores, which made things
>> worse) made any difference.
>>
>> On Wed, Aug 29, 2018 at 8:33 PM Mike Walch  wrote:
>>
>>> Muchos does not automatically change its Accumulo configuration to take
>>> advantage of better hardware. However, it does have a performance profile
>>> setting in its configuration (see link below) where you can select a
>>> profile (or create your own) based on your the hardware you are using.
>>>
>>>
>>> https://github.com/apache/fluo-muchos/blob/master/conf/muchos.props.example#L94
>>>
>>> On Wed, Aug 29, 2018 at 11:35 AM Josh Elser  wrote:
>>>
>>>> Does Muchos actually change the Accumulo configuration when you are
>>>> changing the underlying hardware?
>>>>
>>>> On 8/29/18 8:04 AM, guy sharon wrote:
>>>> > hi,
>>>> >
>>>> > Continuing my performance benchmarks, I'm still trying to figure out
>>>> if
>>>> > the results I'm getting are reasonable and why throwing more hardware
>>>> at
>>>> > the problem doesn't help. What I'm doing is a full table scan on a
>>>> table
>>>> > with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and
>>>> Hadoop
>>>> > 2.8.4. The table is populated by
>>>> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
>>>> > modified to write 6M entries instead of 50k. Reads are performed by
>>>> > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData
>>>> -i
>>>> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are
>>>> the
>>>> > results I got:
>>>> >
>>>> > 1. 5 tserver cluster as configured by Muchos
>>>> > (https://github.com/apache/fluo-muchos), running on m5d.large AWS
>>>> > machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate
>>>> > server. Scan took 12 seconds.
>>>> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
>>>> > 3. Splitting the table to 4 tablets causes the runtime to increase to
>>>> 16
>>>> > seconds.
>>>> > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
>>>> > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
>>>> > Amazon Linux. Configuration as provided by Uno
>>>> > (https://github.com/apache/fluo-uno). Total time was 26 seconds.
>>>> >
>>>> > Offhand I would say this is very slow. I'm guessing I'm making some
>>>> sort
>>>> > of newbie (possibly configuration) mistake but I can't figure out
>>>> what
>>>> > it is. Can anyone point me to something that might help me find out
>>>> what
>>>> > it is?
>>>> >
>>>> > thanks,
>>>> > Guy.
>>>> >
>>>> >
>>>>
>>>


Re: Accumulo performance on various hardware configurations

2018-08-29 Thread guy sharon
Yes, I tried the high performance configuration which translates to 4G heap
size, but that didn't affect performance. Neither did setting
table.scan.max.memory to 4096k (default is 512k). Even if I accept that the
read performance here is reasonable I don't understand why none of the
hardware configuration changes (except going to 48 cores, which made things
worse) made any difference.

On Wed, Aug 29, 2018 at 8:33 PM Mike Walch  wrote:

> Muchos does not automatically change its Accumulo configuration to take
> advantage of better hardware. However, it does have a performance profile
> setting in its configuration (see link below) where you can select a
> profile (or create your own) based on your the hardware you are using.
>
>
> https://github.com/apache/fluo-muchos/blob/master/conf/muchos.props.example#L94
>
> On Wed, Aug 29, 2018 at 11:35 AM Josh Elser  wrote:
>
>> Does Muchos actually change the Accumulo configuration when you are
>> changing the underlying hardware?
>>
>> On 8/29/18 8:04 AM, guy sharon wrote:
>> > hi,
>> >
>> > Continuing my performance benchmarks, I'm still trying to figure out if
>> > the results I'm getting are reasonable and why throwing more hardware
>> at
>> > the problem doesn't help. What I'm doing is a full table scan on a
>> table
>> > with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and
>> Hadoop
>> > 2.8.4. The table is populated by
>> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
>> > modified to write 6M entries instead of 50k. Reads are performed by
>> > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData
>> -i
>> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the
>> > results I got:
>> >
>> > 1. 5 tserver cluster as configured by Muchos
>> > (https://github.com/apache/fluo-muchos), running on m5d.large AWS
>> > machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate
>> > server. Scan took 12 seconds.
>> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
>> > 3. Splitting the table to 4 tablets causes the runtime to increase to
>> 16
>> > seconds.
>> > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
>> > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
>> > Amazon Linux. Configuration as provided by Uno
>> > (https://github.com/apache/fluo-uno). Total time was 26 seconds.
>> >
>> > Offhand I would say this is very slow. I'm guessing I'm making some
>> sort
>> > of newbie (possibly configuration) mistake but I can't figure out what
>> > it is. Can anyone point me to something that might help me find out
>> what
>> > it is?
>> >
>> > thanks,
>> > Guy.
>> >
>> >
>>
>


Re: Accumulo performance on various hardware configurations

2018-08-29 Thread guy sharon
Well, in one experiment I used a machine with 48 cores and 192GB and the
results actually came out worse. And in another I had 7 tservers on servers
with 4 cores. I think I'm not configuring things correctly because I'd
expect the improved hardware to improve performance and that doesn't seem
to be the case.

On Wed, Aug 29, 2018 at 4:00 PM Jeremy Kepner  wrote:

> Your node is fairly underpowered (2 cores and 8 GB RAM) and is less than
> most laptops.  That said
>
> 6M / 12sec = 500K/sec
>
> is good for a single node Accumulo instance on this hardware.
>
> Spitting might not help since you only have 2 cores so added parallism
> can't
> be exploited.
>
> Why do you think 500K/sec is slow?
>
> To determine slowness one would have to compare with other database
> technology on the same platform.
>
>
> On Wed, Aug 29, 2018 at 03:04:51PM +0300, guy sharon wrote:
> > hi,
> >
> > Continuing my performance benchmarks, I'm still trying to figure out if
> the
> > results I'm getting are reasonable and why throwing more hardware at the
> > problem doesn't help. What I'm doing is a full table scan on a table with
> > 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop
> 2.8.4.
> > The table is populated by
> > org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
> > modified to write 6M entries instead of 50k. Reads are performed by
> > "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i
> > muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the
> > results I got:
> >
> > 1. 5 tserver cluster as configured by Muchos (
> > https://github.com/apache/fluo-muchos), running on m5d.large AWS
> machines
> > (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan
> > took 12 seconds.
> > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
> > 3. Splitting the table to 4 tablets causes the runtime to increase to 16
> > seconds.
> > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
> > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
> > Amazon Linux. Configuration as provided by Uno (
> > https://github.com/apache/fluo-uno). Total time was 26 seconds.
> >
> > Offhand I would say this is very slow. I'm guessing I'm making some sort
> of
> > newbie (possibly configuration) mistake but I can't figure out what it
> is.
> > Can anyone point me to something that might help me find out what it is?
> >
> > thanks,
> > Guy.
>


Re: Accumulo performance on various hardware configurations

2018-08-29 Thread guy sharon
hi Marc,

Just ran the test again with the changes you suggested. Setup: 5 tservers
on CentOS 7, 4 CPUs and 16 GB RAM, Accumulo 1.7.4, table with 6M rows.
org.apache.accumulo.examples.simple.helloworld.ReadData now uses a
BatchScanner with 10 threads. I got:

$ time install/accumulo-1.7.4/bin/accumulo
org.apache.accumulo.examples.simple.helloworld.ReadData -i muchos -z
localhost:2181 -u root -t hellotable -p secret

real0m16.979s
user0m13.670s
sys0m0.599s

So this doesn't really improve things. That looks strange to me as I'd
expect Accumulo to use the threads to speed things up. Unless the full scan
makes it use just one thread with the assumption that the entries are next
to each other on the disk making it faster to read them sequentially rather
than jump back and forth with threads. What do you think?

BR,
Guy.




On Wed, Aug 29, 2018 at 3:25 PM Marc  wrote:

> Guy,
>   To clarify :
>
> [1] If you have four tablets it's reasonable to suspect that the RPC
> time to access those servers may increase a bit if you access them
> sequentially versus in parallel.
> On Wed, Aug 29, 2018 at 8:16 AM Marc  wrote:
> >
> > Guy,
> >   The ReadData example appears to use a sequential scanner. Can you
> > change that to a batch scanner and see if there is improvement [1]?
> > Also, while you are there can you remove the log statement or set your
> > log level so that the trace message isn't printed?
> >
> > In this case we are reading the entirety of that data. If you were to
> > perform a query you would likely prefer to do it at the data instead
> > of bringing all data back to the client.
> >
> > What are your expectations since it appears very slow. Do you want
> > faster client side access to the data? Certainly improvements could be
> > made -- of that I have no doubt -- but the time to bring 6M entries to
> > the client is a cost you will incur if you use the ReadData example.
> >
> > [1] If you have four tablets it's reasonable to suspect that the RPC
> > time to access those servers may increase a bit.
> >
> > On Wed, Aug 29, 2018 at 8:05 AM guy sharon 
> wrote:
> > >
> > > hi,
> > >
> > > Continuing my performance benchmarks, I'm still trying to figure out
> if the results I'm getting are reasonable and why throwing more hardware at
> the problem doesn't help. What I'm doing is a full table scan on a table
> with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop
> 2.8.4. The table is populated by
> org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
> modified to write 6M entries instead of 50k. Reads are performed by
> "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i
> muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the
> results I got:
> > >
> > > 1. 5 tserver cluster as configured by Muchos (
> https://github.com/apache/fluo-muchos), running on m5d.large AWS machines
> (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan
> took 12 seconds.
> > > 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
> > > 3. Splitting the table to 4 tablets causes the runtime to increase to
> 16 seconds.
> > > 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
> > > 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
> Amazon Linux. Configuration as provided by Uno (
> https://github.com/apache/fluo-uno). Total time was 26 seconds.
> > >
> > > Offhand I would say this is very slow. I'm guessing I'm making some
> sort of newbie (possibly configuration) mistake but I can't figure out what
> it is. Can anyone point me to something that might help me find out what it
> is?
> > >
> > > thanks,
> > > Guy.
> > >
> > >
>


Accumulo performance on various hardware configurations

2018-08-29 Thread guy sharon
hi,

Continuing my performance benchmarks, I'm still trying to figure out if the
results I'm getting are reasonable and why throwing more hardware at the
problem doesn't help. What I'm doing is a full table scan on a table with
6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop 2.8.4.
The table is populated by
org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter
modified to write 6M entries instead of 50k. Reads are performed by
"bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i
muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the
results I got:

1. 5 tserver cluster as configured by Muchos (
https://github.com/apache/fluo-muchos), running on m5d.large AWS machines
(2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan
took 12 seconds.
2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
3. Splitting the table to 4 tablets causes the runtime to increase to 16
seconds.
4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running
Amazon Linux. Configuration as provided by Uno (
https://github.com/apache/fluo-uno). Total time was 26 seconds.

Offhand I would say this is very slow. I'm guessing I'm making some sort of
newbie (possibly configuration) mistake but I can't figure out what it is.
Can anyone point me to something that might help me find out what it is?

thanks,
Guy.


Re: benchmarking

2018-08-28 Thread guy sharon
hi Jeremy,

Do you have any information on how you configure them and what kind of
hardware they run on?

Thanks,
Guy.



On Tue, Aug 28, 2018 at 3:44 PM Jeremy Kepner  wrote:

> FYI, Single node Accumulo instances is our most popular deployment.
> We have hundreds of them.   Accummulo is so fast that it can replace
> what would normally require 20 MySQL servers.
>
> Regards.  -Jeremy
>
> On Tue, Aug 28, 2018 at 07:38:37AM +, Sean Busbey wrote:
> > Hi Guy,
> >
> > Apache Accumulo is designed for horizontally scaling out for large scale
> workloads that need to do random reads and writes. There's a non-trivial
> amount of overhead that comes with a system aimed at doing that on
> thousands of nodes.
> >
> > If your use case works for a single laptop with such a small number of
> entries and exhaustive scans, then Accumulo is probably not the correct
> tool for the job.
> >
> > For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset
> size you can just rely on a file format like Apache Avro:
> >
> > busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy
> --count 630 --schema '{ "type": "record", "name": "entry", "fields": [
> { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
> > Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader 
> > WARNING: Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> > test.seed=1535441473243
> >
> > real  0m5.451s
> > user  0m5.922s
> > sys   0m0.656s
> > busbey$ ls -lah ~/Downloads/6.3m_entries.avro
> > -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31
> /Users/busbey/Downloads/6.3m_entries.avro
> > busbey$ time java -jar avro-tools-1.7.7.jar tojson
> ~/Downloads/6.3m_entries.avro | wc -l
> >  630
> >
> > real  0m4.239s
> > user  0m6.026s
> > sys   0m0.721s
> >
> > I'd recommend that you start at >= 5 nodes if you want to look at rough
> per-node throughput capabilities.
> >
> >
> > On 2018/08/28 06:59:38, guy sharon  wrote:
> > > hi Mike,
> > >
> > > Thanks for the links.
> > >
> > > My current setup is a 4 node cluster (tserver, master, gc, monitor)
> running
> > > on Alpine Docker containers on a laptop with an i7 processor (8 cores)
> with
> > > 16GB of RAM. As an example I'm running a count of all entries for a
> table
> > > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if
> this is
> > > reasonable or not. Seems a little slow to me. What do you think?
> > >
> > > BR,
> > > Guy.
> > >
> > >
> > >
> > >
> > > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall 
> wrote:
> > >
> > > > Hi Guy,
> > > >
> > > > Here are a couple links I found.  Can you tell us more about your
> setup
> > > > and what you are seeing?
> > > >
> > > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > > >
> > > > Mike
> > > >
> > > >
> > > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <
> guy.sharon.1...@gmail.com>
> > > > wrote:
> > > >
> > > >> hi,
> > > >>
> > > >> I've just started working with Accumulo and I think I'm
> experiencing slow
> > > >> reads/writes. I'm aware of the recommended configuration. Does
> anyone know
> > > >> of any standard benchmarks and benchmarking tools I can use to tell
> if the
> > > >> performance I'm getting is reasonable?
> > > >>
> > > >>
> > > >>
> > >
>


Re: benchmarking

2018-08-28 Thread guy sharon
hi Sean,

Thanks for the advice. I tried bringing up a 5 tserver cluster on AWS with
Muchos (https://github.com/apache/fluo-muchos). My first attempt was using
servers with 2 vCPU, 8GB RAM (m5d.large on AWS). The Hadoop datanodes were
colocated with the tservers and the Accumulo master was on the same server
as the Hadoop namenode. I populated a table with 6M entries using a
modified version of
org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter from
Accumulo (the only thing I modified was the number of entries as it usually
inserts 50k). I then did a count with "bin/accumulo shell -u root -p secret
-e "scan -t hellotable -np" | wc -l". That took 15 seconds. I then upgraded
to m5d.xlarge instances (4vCPU, 16GB RAM) and got the exact same result, so
it seems upgrading the servers doesn't help.

Is this expected or am I doing something terribly wrong?

BR,
Guy.



On Tue, Aug 28, 2018 at 10:38 AM Sean Busbey  wrote:

> Hi Guy,
>
> Apache Accumulo is designed for horizontally scaling out for large scale
> workloads that need to do random reads and writes. There's a non-trivial
> amount of overhead that comes with a system aimed at doing that on
> thousands of nodes.
>
> If your use case works for a single laptop with such a small number of
> entries and exhaustive scans, then Accumulo is probably not the correct
> tool for the job.
>
> For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size
> you can just rely on a file format like Apache Avro:
>
> busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count
> 630 --schema '{ "type": "record", "name": "entry", "fields": [ {
> "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
> Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader 
> WARNING: Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> test.seed=1535441473243
>
> real0m5.451s
> user0m5.922s
> sys 0m0.656s
> busbey$ ls -lah ~/Downloads/6.3m_entries.avro
> -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31
> /Users/busbey/Downloads/6.3m_entries.avro
> busbey$ time java -jar avro-tools-1.7.7.jar tojson
> ~/Downloads/6.3m_entries.avro | wc -l
>  630
>
> real0m4.239s
> user0m6.026s
> sys 0m0.721s
>
> I'd recommend that you start at >= 5 nodes if you want to look at rough
> per-node throughput capabilities.
>
>
> On 2018/08/28 06:59:38, guy sharon  wrote:
> > hi Mike,
> >
> > Thanks for the links.
> >
> > My current setup is a 4 node cluster (tserver, master, gc, monitor)
> running
> > on Alpine Docker containers on a laptop with an i7 processor (8 cores)
> with
> > 16GB of RAM. As an example I'm running a count of all entries for a table
> > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this
> is
> > reasonable or not. Seems a little slow to me. What do you think?
> >
> > BR,
> > Guy.
> >
> >
> >
> >
> > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall  wrote:
> >
> > > Hi Guy,
> > >
> > > Here are a couple links I found.  Can you tell us more about your setup
> > > and what you are seeing?
> > >
> > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > >
> > > Mike
> > >
> > >
> > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon 
> > > wrote:
> > >
> > >> hi,
> > >>
> > >> I've just started working with Accumulo and I think I'm experiencing
> slow
> > >> reads/writes. I'm aware of the recommended configuration. Does anyone
> know
> > >> of any standard benchmarks and benchmarking tools I can use to tell
> if the
> > >> performance I'm getting is reasonable?
> > >>
> > >>
> > >>
> >
>


Re: benchmarking

2018-08-28 Thread guy sharon
hi Mike,

Thanks for the links.

My current setup is a 4 node cluster (tserver, master, gc, monitor) running
on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
16GB of RAM. As an example I'm running a count of all entries for a table
with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
reasonable or not. Seems a little slow to me. What do you think?

BR,
Guy.




On Mon, Aug 27, 2018 at 4:43 PM Michael Wall  wrote:

> Hi Guy,
>
> Here are a couple links I found.  Can you tell us more about your setup
> and what you are seeing?
>
> https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> https://www.youtube.com/watch?v=Ae9THpmpFpM
>
> Mike
>
>
> On Sat, Aug 25, 2018 at 5:09 PM guy sharon 
> wrote:
>
>> hi,
>>
>> I've just started working with Accumulo and I think I'm experiencing slow
>> reads/writes. I'm aware of the recommended configuration. Does anyone know
>> of any standard benchmarks and benchmarking tools I can use to tell if the
>> performance I'm getting is reasonable?
>>
>>
>>


benchmarking

2018-08-25 Thread guy sharon
hi,

I've just started working with Accumulo and I think I'm experiencing slow
reads/writes. I'm aware of the recommended configuration. Does anyone know
of any standard benchmarks and benchmarking tools I can use to tell if the
performance I'm getting is reasonable?