Re: Congratulations to GSoC 2020 Students of Apache Gora

2020-05-04 Thread Sheriffo Ceesay
Congratulations to all the Apache Gora GSoC Students! Hoping to see a great
contribution from you all.

On Mon, 4 May 2020 at 22:05, Furkan KAMACI  wrote:

> Hi GSoC Students,
>
> Congrats you all!
>
> We are in the community bonding period until June 1. During this period,
> you should get to know your mentors, read documentation, get up to speed to
> begin working on your projects [1]
>
> Please check issues at Jira [2] to find tasks that you can work as a
> warm-up. On the other hand, please check previous reports from here [3] and
> be ready to fill weekly progress reports there.
>
> The first remarkable point of the successful GSoC projects is "keeping
> communication" which is most important than you may think.
>
> Let us know if you have any questions. I hope you have a great experience
> during your GSoC period!
>
> [1]
>
> http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
> [2] https://issues.apache.org/jira/projects/GORA/issues
> [3] https://cwiki.apache.org/confluence/display/GORA/Google+Summer+of+Code
>
> Kind Regards,
> Furkan KAMACI
>
-- 

**Sheriffo Ceesay**


Re: Benchmark Module Results

2020-03-31 Thread Sheriffo Ceesay
Hi Furkan,

Thank you for the email. I was only able to run some minor benchmarks
mainly to show that the module is working as expected. These results are
available in my report.

I am not aware of anyone running any comprehensive benchmarks.

I have signed up to be a mentor this year and I will be happy to work with
students to accomplish the benchmarking task you are proposing. This will
probably help us to also improve the module further.

Thank you.


**Sheriffo Ceesay**


On Tue, Mar 31, 2020 at 9:13 PM Furkan KAMACI 
wrote:

> Hi,
>
> First of all, thanks to Sheriffo Ceesay who has implemented my Apache Gora
> Benchmark suggestion during his GSoC 2019 period.
>
> I would like to know that does Sheriffo or any other fellows had a
> chance to run a comprehensive benchmark or already published results
> anywhere? We can publish the results on our webpage (I cannot see related
> info here: https://gora.apache.org/current/index.html#gora-modules)?
>
> There are new data store proposals for GSoC 2020 and I would want from the
> students to contribute benchmark results if they are accepted.
>
> Kind Regards,
> Furkan KAMACI
>


Re: [WELCOME] John Mora to Gora PMC and Committership

2019-10-15 Thread Sheriffo Ceesay
Congratulations John! Welcome onboard.

Sheriffo.

On Tue, Oct 15, 2019 at 7:57 AM Cihad Guzel  wrote:

> Congratulations and Welcome John!
>
> Cihad Guzel
>
>
> Alfonso Nishikawa , 13 Eki 2019 Paz, 19:54
> tarihinde şunu yazdı:
>
> > Hi Gora Community,
> >
> > I'm very happy to welcome John Mora to our ranks. He has implemented the
> > module gora-kudu this Google Summer of Code.
> > John, your account has been created, so you can try your first push
> > updating the Gora parent pom.xml [1] and the project webpage [2] with a
> > reference to yourself [1]. If you need some assistance with that then
> > please let me know.
> >
> > [1] https://github.com/apache/gora/blob/master/pom.xml
> > [2] http://gora.apache.org/credits.html
> >
> > Regards,
> >
> > Alfonso Nishikawa
> >
>


[jira] [Created] (GORA-640) Add gora-benchmark documentation to website

2019-10-13 Thread Sheriffo Ceesay (Jira)
Sheriffo Ceesay created GORA-640:


 Summary: Add gora-benchmark documentation to website
 Key: GORA-640
 URL: https://issues.apache.org/jira/browse/GORA-640
 Project: Apache Gora
  Issue Type: Task
Affects Versions: 0.9
Reporter: Sheriffo Ceesay
Assignee: Sheriffo Ceesay


Document gora-benchmark module on Gora Website.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GORA-639) Add new YCSB dependency for gora-benchmark module

2019-10-11 Thread Sheriffo Ceesay (Jira)
Sheriffo Ceesay created GORA-639:


 Summary: Add new YCSB dependency for gora-benchmark module
 Key: GORA-639
 URL: https://issues.apache.org/jira/browse/GORA-639
 Project: Apache Gora
  Issue Type: Improvement
  Components: gora-core
Affects Versions: 0.9
Reporter: Sheriffo Ceesay
Assignee: Sheriffo Ceesay


YCSB is the main dependency of gora-benchmark module. At the time of the 
implementation of the module, the dependency wasn't published to maven central. 
This issue was reported to the maintainers and it is now fixed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [WELCOME] Sheriffo Ceesay to Gora PMC and Committership

2019-09-27 Thread Sheriffo Ceesay
Hi,

Thank you all for the very warm welcome. I am very much delighted and
humbled to be part of this great team.

@Lewis John Mcgibbney  , thanks for the
pointers, once the account creation process is completed, I will go ahead
and complete the rest of the process.

Thank you.
Kind regards


**Sheriffo Ceesay**


On Fri, Sep 27, 2019 at 5:46 PM Furkan KAMACI 
wrote:

> Hi,
>
> Welcom Sheriffo!
>
> Kind Regards,
> Furkan KAMACI
>
> 26 Eyl 2019 Per, saat 23:19 tarihinde Cihad Guzel 
> şunu yazdı:
>
>> Hi Sheriffo,
>>
>> Welcome on board. Great to have you here!
>>
>> Cihad Guzel
>>
>>
>> lewis john mcgibbney , 26 Eyl 2019 Per, 22:40
>> tarihinde
>> şunu yazdı:
>>
>> > Hi dev@,
>> > I'm very happy to welcome Sheriffo Ceesay to our ranks. Sheriffo was
>> one of
>> > our GSoC students this year with his work resulting in a benchmarking
>> > module based on Yahoo!'s Cloud Serving Benchmark [0].
>> > Sheriffo is finishing off his Ph.D. at the University of St. Andrews,
>> in my
>> > mother country of Scotland and has mentioned that he will be pretty
>> quiet
>> > until that is done. I personally with you the very best with that.
>> > Sheriffo, once your account is created it is customary to commit your
>> > details to the Gora parent pom.xml [1]. If you need some assistance with
>> > that then please let me know.
>> > Lewis
>> >
>> > [0] https://github.com/apache/gora/tree/master/gora-benchmark
>> > [1] https://github.com/apache/gora/blob/master/pom.xml#L53
>> >
>> > --
>> > http://home.apache.org/~lewismc/
>> > http://people.apache.org/keys/committer/lewismc
>> >
>>
>


Re: YCSB dependency issue -- Plan B?

2019-08-31 Thread Sheriffo Ceesay
Hi Renato,

Okay, no problem. I think that is the general consensus so I will wait for
Sean to get back to us.

Thank you.

**Sheriffo Ceesay**


On Sat, Aug 31, 2019 at 7:10 AM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hey Sheriffo,
>
> Yes, I agree with the others, let's wait until the end of the weekend
> and then see where we are and decide at that point.
>
>
> Best,
>
> Renato M.
>
> El vie., 30 ago. 2019 a las 4:29, Furkan KAMACI
> () escribió:
> >
> > Hi Sheriffo,
> >
> > There are some ways to overcome that issue i.e. using JitPack:
> https://jitpack.io However, it should worth to wait Sean Busbuy's effort.
> Otherwise, we can go as Lewis suggested.
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Thu, Aug 29, 2019 at 9:04 PM Sheriffo Ceesay 
> wrote:
> >>
> >> There has been an update from Sean Busbuy on the issue [1] that there is
> >> some progress and he will look into this issue at the weekend. So I
> think,
> >> it may be wise to see what happens after the weekend.
> >>
> >> [1] https://github.com/brianfrankcooper/YCSB/issues/1340
> >>
> >> Thanks.
> >>
> >>
> >> **Sheriffo Ceesay**
> >>
> >>
> >> On Thu, Aug 29, 2019 at 6:11 PM lewis john mcgibbney <
> lewi...@apache.org>
> >> wrote:
> >>
> >> > Go for it +1
> >> > We can then merge and essentially disable the module in the build and
> test
> >> > until the dependency is available.
> >> > Lewis
> >> >
> >> > On Thu, Aug 29, 2019 at 7:30 AM Sheriffo Ceesay <
> sneceesa...@gmail.com>
> >> > wrote:
> >> >
> >> > > All,
> >> > >
> >> > > I think getting YCSB to publish their artefacts to maven central is
> >> > taking
> >> > > an unexpected amount of time. It seems they are having an infinite
> >> > internal
> >> > > discussion [1], so I thought of a possible Plan B.
> >> > >
> >> > > The plan would be to auto-provision the required YCSB jar by
> scripting
> >> > the
> >> > > current manual process that I used. This will entail, downloading
> the
> >> > > latest YCSB or stable source, perform a local maven install,
> compile and
> >> > > run some tests.
> >> > >
> >> > > If the above suggestions make sense, then I can find some time and
> work
> >> > on
> >> > > the implementation. Alternatively, if you have a better idea,
> please let
> >> > me
> >> > > know.
> >> > >
> >> > > Please let me know what your thoughts are.
> >> > >
> >> > > [1] https://github.com/brianfrankcooper/YCSB/issues/1340
> >> > >
> >> > > Thank you.
> >> > >
> >> > > **Sheriffo Ceesay**
> >> > >
> >> >
> >> >
> >> > --
> >> > http://home.apache.org/~lewismc/
> >> > http://people.apache.org/keys/committer/lewismc
> >> >
>


Re: YCSB dependency issue -- Plan B?

2019-08-29 Thread Sheriffo Ceesay
There has been an update from Sean Busbuy on the issue [1] that there is
some progress and he will look into this issue at the weekend. So I think,
it may be wise to see what happens after the weekend.

[1] https://github.com/brianfrankcooper/YCSB/issues/1340

Thanks.


**Sheriffo Ceesay**


On Thu, Aug 29, 2019 at 6:11 PM lewis john mcgibbney 
wrote:

> Go for it +1
> We can then merge and essentially disable the module in the build and test
> until the dependency is available.
> Lewis
>
> On Thu, Aug 29, 2019 at 7:30 AM Sheriffo Ceesay 
> wrote:
>
> > All,
> >
> > I think getting YCSB to publish their artefacts to maven central is
> taking
> > an unexpected amount of time. It seems they are having an infinite
> internal
> > discussion [1], so I thought of a possible Plan B.
> >
> > The plan would be to auto-provision the required YCSB jar by scripting
> the
> > current manual process that I used. This will entail, downloading the
> > latest YCSB or stable source, perform a local maven install, compile and
> > run some tests.
> >
> > If the above suggestions make sense, then I can find some time and work
> on
> > the implementation. Alternatively, if you have a better idea, please let
> me
> > know.
> >
> > Please let me know what your thoughts are.
> >
> > [1] https://github.com/brianfrankcooper/YCSB/issues/1340
> >
> > Thank you.
> >
> > **Sheriffo Ceesay**
> >
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


YCSB dependency issue -- Plan B?

2019-08-29 Thread Sheriffo Ceesay
All,

I think getting YCSB to publish their artefacts to maven central is taking
an unexpected amount of time. It seems they are having an infinite internal
discussion [1], so I thought of a possible Plan B.

The plan would be to auto-provision the required YCSB jar by scripting the
current manual process that I used. This will entail, downloading the
latest YCSB or stable source, perform a local maven install, compile and
run some tests.

If the above suggestions make sense, then I can find some time and work on
the implementation. Alternatively, if you have a better idea, please let me
know.

Please let me know what your thoughts are.

[1] https://github.com/brianfrankcooper/YCSB/issues/1340

Thank you.

**Sheriffo Ceesay**


Re: Final Report

2019-08-23 Thread Sheriffo Ceesay
Hi Renato,

See replies inline.

On Thu, Aug 22, 2019 at 5:52 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hey Sheriffo,
>
> Thanks for the report and all the work!
> Gora performing worst when inserting data in the HBase case I think it
> can make sense, because Gora still needs to serialize every data bean
> through Avro, (maybe some caching? but Sheriffo also deactivated this
> with gora.hbasestore.hbase.client.autoflush.enabled=true) so I guess
> the rest of the time it is just Gora serialization.
>

I agree with you.


> Now for the reads in HBase-native and HBase-Gora, are we sure we are
> getting the same granularity of objects? I mean because of the mapping
> Gora does (different column families per attribute), maybe we are
> fetching the attributes in a different way than HBase is doing, maybe
> Gora fetches only some column families whereas HBase fetches
> everything.
>

I have done some basic test to verify this see the testUpdate() method in
the GoraClientTest file. Here, I insert some strings retrieve them and
verify that they match the expected value.

Did you run any correctness tests to know that we are retrieving the
> correct results in both cases? Something like inserting an integer as
> part of the attributes, and then summing them when retrieved to check
> that the sum is what we expect.
>

Thanks for this, I have added a new test case called testCorrectness() to
handle the issue you have raised. The results I got are consistent with we
are expecting.

>
> Best,
>
> Renato M.
>
> El jue., 22 ago. 2019 a las 5:17, Sheriffo Ceesay
> () escribió:
> >
> > Hi Furqan,
> >
> > Yes, it baffled me as well. I haven't made any specific performance
> optimisation configuration to either of the setups so I think these results
> may not be final at this stage and would need further investigation.
> >
> > The only setting I set for HBase for Apache Gora in the gora.properties
> file is:
> >
> > gora.hbasestore.hbase.client.autoflush.enabled=true
> >
> > For the local HBase setup, I have followed the recommendations here [1]
> to avoid any performance issues.
> >
> > https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
> >
> > Basically, the setups are fresh and simplified installations with any
> major configuration for optimisation.
> >
> > Thank you.
> >
> > *Sheriffo Ceesay*
> >
> >
> >
> > On Thu, Aug 22, 2019 at 12:45 PM Furkan KAMACI 
> wrote:
> >>
> >> Hi Sheriffo,
> >>
> >> Thanks for the updates!
> >>
> >> By the way, I still wonder the reason of poorly performance of HBase
> native
> >> implementation.
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Thu, Aug 22, 2019 at 2:37 PM Sheriffo Ceesay 
> >> wrote:
> >>
> >> > Hi Furkan,
> >> > Thanks for your feedback.
> >> >
> >> > Please find replies to your comments inline.
> >> >
> >> > On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI  >
> >> > wrote:
> >> >
> >> > > Hi Sheriffo,
> >> > >
> >> > > Thanks for your great effort!
> >> > >
> >> > > 1) Could you separate charts for HBase and MongoDB? HBase charts
> suppress
> >> > > MongoDB ones.
> >> > >
> >> > Yes, this is now done. Can you please have a look?
> >> >
> >> > >
> >> > > 2) Report says that:
> >> > >
> >> > > *"In this work, we have time to include only three gora data stores
> >> > > (MongoDB, HBase and CouchDB)"*
> >> > >
> >> > > However, you have not run this benchmark for CouchDB as far as I
> know?
> >> > >
> >> >
> >> > Yes, you are right that it is not included in the benchmark results
> but I
> >> > have included its implementation in the module. This includes
> >> > auto-generating mapping and related files. Due to time factors, there
> was a
> >> > bit of discussion as to which datastores to include in the preliminary
> >> > benchmarking and we have decided to include HBase and MongoDB. In
> future, I
> >> > will work on adding more data stores and compare their performance as
> well.
> >> >
> >> >
> >> > > 3) I don't think there is a need to add commit hashes and messages
> as
> >> > > Appendix. Especially if we consider that hashes will be changed
> once the
&

Re: Final Report

2019-08-22 Thread Sheriffo Ceesay
Hi Furqan,

Yes, it baffled me as well. I haven't made any specific performance
optimisation configuration to either of the setups so I think these results
may not be final at this stage and would need further investigation.

The only setting I set for HBase for Apache Gora in the gora.properties
file is:

*gora.hbasestore.hbase.client.autoflush.enabled=true*

For the local HBase setup, I have followed the recommendations here [1] to
avoid any performance issues.

https://github.com/brianfrankcooper/YCSB/tree/master/hbase098

Basically, the setups are fresh and simplified installations with any major
configuration for optimisation.

Thank you.


**Sheriffo Ceesay**


On Thu, Aug 22, 2019 at 12:45 PM Furkan KAMACI 
wrote:

> Hi Sheriffo,
>
> Thanks for the updates!
>
> By the way, I still wonder the reason of poorly performance of HBase native
> implementation.
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Aug 22, 2019 at 2:37 PM Sheriffo Ceesay 
> wrote:
>
> > Hi Furkan,
> > Thanks for your feedback.
> >
> > Please find replies to your comments inline.
> >
> > On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI 
> > wrote:
> >
> > > Hi Sheriffo,
> > >
> > > Thanks for your great effort!
> > >
> > > 1) Could you separate charts for HBase and MongoDB? HBase charts
> suppress
> > > MongoDB ones.
> > >
> > Yes, this is now done. Can you please have a look?
> >
> > >
> > > 2) Report says that:
> > >
> > > *"In this work, we have time to include only three gora data stores
> > > (MongoDB, HBase and CouchDB)"*
> > >
> > > However, you have not run this benchmark for CouchDB as far as I know?
> > >
> >
> > Yes, you are right that it is not included in the benchmark results but I
> > have included its implementation in the module. This includes
> > auto-generating mapping and related files. Due to time factors, there
> was a
> > bit of discussion as to which datastores to include in the preliminary
> > benchmarking and we have decided to include HBase and MongoDB. In
> future, I
> > will work on adding more data stores and compare their performance as
> well.
> >
> >
> > > 3) I don't think there is a need to add commit hashes and messages as
> > > Appendix. Especially if we consider that hashes will be changed once
> the
> > PR
> > > merged into the codebase.
> > >
> >
> > I have seen this as a good tip in the email send by GSoC team, but I
> agree
> > with you and I have now removed this.
> >
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> >
> >
> > Thank you.
> > Sheriffo.
> >
> > >
> >
> >
> > > On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay  >
> > > wrote:
> > >
> > > > All,
> > > >
> > > > My draft final report is available at
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
> > > >
> > > > We have until 26th of this month submit the report. Please let me
> know
> > if
> > > > you have any comments to improve it.
> > > >
> > > > Meanwhile, I will work on the documentation on how to run the
> benchmark
> > > > module and publish on gora website.
> > > >
> > > > Thank you.
> > > >
> > > > **Sheriffo Ceesay**
> > > >
> > >
> >
>


Re: Final Report

2019-08-22 Thread Sheriffo Ceesay
Hi Furkan,
Thanks for your feedback.

Please find replies to your comments inline.

On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI 
wrote:

> Hi Sheriffo,
>
> Thanks for your great effort!
>
> 1) Could you separate charts for HBase and MongoDB? HBase charts suppress
> MongoDB ones.
>
Yes, this is now done. Can you please have a look?

>
> 2) Report says that:
>
> *"In this work, we have time to include only three gora data stores
> (MongoDB, HBase and CouchDB)"*
>
> However, you have not run this benchmark for CouchDB as far as I know?
>

Yes, you are right that it is not included in the benchmark results but I
have included its implementation in the module. This includes
auto-generating mapping and related files. Due to time factors, there was a
bit of discussion as to which datastores to include in the preliminary
benchmarking and we have decided to include HBase and MongoDB. In future, I
will work on adding more data stores and compare their performance as well.


> 3) I don't think there is a need to add commit hashes and messages as
> Appendix. Especially if we consider that hashes will be changed once the PR
> merged into the codebase.
>

I have seen this as a good tip in the email send by GSoC team, but I agree
with you and I have now removed this.

>
> Kind Regards,
> Furkan KAMACI


Thank you.
Sheriffo.

>


> On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay 
> wrote:
>
> > All,
> >
> > My draft final report is available at
> >
> >
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
> >
> > We have until 26th of this month submit the report. Please let me know if
> > you have any comments to improve it.
> >
> > Meanwhile, I will work on the documentation on how to run the benchmark
> > module and publish on gora website.
> >
> > Thank you.
> >
> > **Sheriffo Ceesay**
> >
>


Final Report

2019-08-21 Thread Sheriffo Ceesay
All,

My draft final report is available at
https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora

We have until 26th of this month submit the report. Please let me know if
you have any comments to improve it.

Meanwhile, I will work on the documentation on how to run the benchmark
module and publish on gora website.

Thank you.

**Sheriffo Ceesay**


Week 12 Report

2019-08-18 Thread Sheriffo Ceesay
All,

Week 12 report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

I have resolved almost all the comments by @Lewis John Mcgibbney
 and @Kevin Ratnasekera
 except
for a few where I may need clarification or guidance.

We are still waiting on the folks from YCSB to proceed with this issue (
https://github.com/brianfrankcooper/YCSB/issues/1340
<https://github.com/brianfrankcooper/YCSB/issues/1340#issuecomment-520114829>
)

If the above issue is resolved then it will be simpler to test the
benchmark module. Currently, the readme file has some guidlines on how to
compile and run the module.


**Sheriffo Ceesay**


Week 11 Report

2019-08-11 Thread Sheriffo Ceesay
All,

Week 11 report of [1] is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

Basically, I have worked on the comments on the pull request [2]. It will
be great if someone can run my code locally to see if there are any issues.
You can use the README file in the project root directory.

We have reported an issue and the maintainers of YCSB are working on it to
get it resolved [3].

[1] https://github.com/sneceesay77/gora/tree/GORA-532
[2] https://github.com/apache/gora/pull/179
[3] https://github.com/brianfrankcooper/YCSB/issues/1340

Thank you.

**Sheriffo Ceesay**


Week 10 Report

2019-08-04 Thread Sheriffo Ceesay
All,

Week 10 report of [1] is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

I have also sent my first pull request for comments and code review
available at [2]. Comments are highly welcomed.

[1] https://github.com/sneceesay77/gora/tree/GORA-532
[2] https://github.com/apache/gora/pull/179

Thank you.


**Sheriffo Ceesay**


Weekly 9 Report

2019-07-30 Thread Sheriffo Ceesay
Late to send this through, experiments still running but almost at the end.
I will update the space with the latest plots when done.

For the rest of the week, I will try cleaning my code, document and send a
pull request for review and comments. Hopefully, this will be done by the
end of next week.

Thank you.

**Sheriffo Ceesay**


Re: Week 8 Report

2019-07-23 Thread Sheriffo Ceesay
Hi Kevin,

Thanks again for the input.

I have taken note of your comments and I will factor them in
moving forward.

Thank you.


**Sheriffo Ceesay**


On Tue, Jul 23, 2019 at 10:13 AM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> Thank you for your findings and hard work on this. I agree on most of the
> points you already mentioned. But I dont think we have consistent client
> implementations across all data stores in Apache Gora. That means some
> clients use async version of their API s as default, some uses sync. Some
> clients have connection pooling implemented and some use single connection
> to do all the data store work. These configurations will generally change
> the behavior when these client s are performed under a huge load. I think
> we will be good, if we do track the client setup/configurations in which
> benchmarks are captured.
>
> Regards
> Kevin
>
> On Tue, Jul 23, 2019 at 2:14 PM Sheriffo Ceesay 
> wrote:
>
> > **Sheriffo Ceesay**
> >
> >
> > On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera <
> djkevincr1...@gmail.com
> > >
> > wrote:
> >
> > > Hi Sheriffo,
> > >
> > > Adding to what Kamaci already mentioned, Have you tried Hbase-Store
> with
> > > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to
> be
> > > batched and asynchronous.
> >
> >
> > Hi Kevin,
> > Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in
> the
> > gora.properties file.
> >
> >
> >
> > > On related note also have a look on points where
> > > you flush() the datastore with native HBase implementation.
> > >
> >
> > The native implementation is through Yahoo! Cloud Service Benchmark. I
> > haven't gone through their implementation. I think it may be due to the
> > default configuration that I am using. I will try to dig further to see
> if
> > I can strike some improvement.
> >
> >
> > >
> > > [1] gora.hbasestore.hbase.client.autoflush.enabled=true
> > >
> > > Regards
> > > Kevin
> > >
> > > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI 
> > > wrote:
> > >
> > > > Hi Sheriffo,
> > > >
> > > > I've checked all the GSoC reports including your's. Thanks for
> filling
> > > > your reports with time slot information about tasks.
> > > >
> > > > You have 3 benchmark result charts at your Week 8 Report.
> Hbase-native
> > is
> > > > dramatically slow compared to the others at 2 out of 3. Do you have
> any
> > > > comment about it?
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay <
> sneceesa...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Week eight report is available at
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> > > >>
> > > >> I ran some workloads to compare Gora implementation of Mongo and
> HBase
> > > to
> > > >> the native implementations. Plots are provided in the reports and
> the
> > > >> generated data is also available at [1]. This is a work in
> progress, I
> > > >> have
> > > >> done Workload A and Workload B [2].
> > > >>
> > > >> Please let me know if you have any suggestions or questions.
> > > >>
> > > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532
> > > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
> > > >>
> > > >>
> > > >> **Sheriffo Ceesay**
> > > >>
> > > >
> > >
> >
>


Re: Week 8 Report

2019-07-23 Thread Sheriffo Ceesay
**Sheriffo Ceesay**


On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> Adding to what Kamaci already mentioned, Have you tried Hbase-Store with
> buffered mutator engaged? [1] It allows HBase operations Eg:- puts to be
> batched and asynchronous.


Hi Kevin,
Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in the
gora.properties file.



> On related note also have a look on points where
> you flush() the datastore with native HBase implementation.
>

The native implementation is through Yahoo! Cloud Service Benchmark. I
haven't gone through their implementation. I think it may be due to the
default configuration that I am using. I will try to dig further to see if
I can strike some improvement.


>
> [1] gora.hbasestore.hbase.client.autoflush.enabled=true
>
> Regards
> Kevin
>
> On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI 
> wrote:
>
> > Hi Sheriffo,
> >
> > I've checked all the GSoC reports including your's. Thanks for filling
> > your reports with time slot information about tasks.
> >
> > You have 3 benchmark result charts at your Week 8 Report. Hbase-native is
> > dramatically slow compared to the others at 2 out of 3. Do you have any
> > comment about it?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay 
> > wrote:
> >
> >> Week eight report is available at
> >>
> >>
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >>
> >> I ran some workloads to compare Gora implementation of Mongo and HBase
> to
> >> the native implementations. Plots are provided in the reports and the
> >> generated data is also available at [1]. This is a work in progress, I
> >> have
> >> done Workload A and Workload B [2].
> >>
> >> Please let me know if you have any suggestions or questions.
> >>
> >> [1] https://github.com/sneceesay77/gora/tree/GORA-532
> >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
> >>
> >>
> >> **Sheriffo Ceesay**
> >>
> >
>


Re: Week 8 Report

2019-07-23 Thread Sheriffo Ceesay
**Sheriffo Ceesay**
Hi Furkan,

Thanks for the reply. I have replied to your comment inline.

On Tue, Jul 23, 2019 at 1:10 AM Furkan KAMACI 
wrote:

> Hi Sheriffo,
>
> I've checked all the GSoC reports including your's. Thanks for filling your
> reports with time slot information about tasks.
>
> You have 3 benchmark result charts at your Week 8 Report. Hbase-native is
> dramatically slow compared to the others at 2 out of 3. Do you have any
> comment about it?


Thanks for the feedback. Right now I don't have a concrete explanation but
I suspect HBase setup. I am using the default setup for all datastores
without setting any optimization settings in the configuration files. I
will have a look at this week. My only concern is I don't want to over
optimise the configuration of a particular datastore thereby giving an
unfair advantage over the others.

Thank you.

>
> Kind Regards,
> Furkan KAMACI
>
> On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay 
> wrote:
>
> > Week eight report is available at
> >
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >
> > I ran some workloads to compare Gora implementation of Mongo and HBase to
> > the native implementations. Plots are provided in the reports and the
> > generated data is also available at [1]. This is a work in progress, I
> have
> > done Workload A and Workload B [2].
> >
> > Please let me know if you have any suggestions or questions.
> >
> > [1] https://github.com/sneceesay77/gora/tree/GORA-532
> > [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
> >
> >
> > **Sheriffo Ceesay**
> >
>


Week 8 Report

2019-07-21 Thread Sheriffo Ceesay
Week eight report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

I ran some workloads to compare Gora implementation of Mongo and HBase to
the native implementations. Plots are provided in the reports and the
generated data is also available at [1]. This is a work in progress, I have
done Workload A and Workload B [2].

Please let me know if you have any suggestions or questions.

[1] https://github.com/sneceesay77/gora/tree/GORA-532
[2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads


**Sheriffo Ceesay**


Week 7 Report

2019-07-14 Thread Sheriffo Ceesay
Week seven report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

Basically, I am currently running workloads on HBase. I will continue to do
this for next week and probably the week after. More details are specified
in the report.

Please let me know if you have any questions.



**Sheriffo Ceesay**


Week 6 Report and Some Questions

2019-07-07 Thread Sheriffo Ceesay
Week six report now available at [1]

I have added CouchDB to the benchmark module. Adding any of the Gora
implemented data store would require auto-creating the mapping and Avro
files. It also requires setting up the datastore locally for testing. I
think for this work and given the timeline, it would be much better to
focus only a few databases and set up a standard benchmarking process. This
process can be used to benchmark other data stores in future.

I will need some suggestions on the way forward for benchmarking these data
stores. With limited resources, these are my plans:

   1. Use a single node on Google Cloud (I got $500 Google credit, thanks
   to a tip from Kevin)
   2. Setup MongoDB, HBase and CouchDB
   3. Setup gora-benchmark to connect to the Google Cloud Instance
   4. Benchmark these three datastores base on the workloads in YCSB
   5. Standardise the process
   6. Add more datastores if time permits. (Would be good to know which
   ones are a priority.)

Next Week: I hope to complete setting up an environment on Google Cloud and
probably have some preliminary numbers to present.

Any suggestions are highly welcomed.

Thank you.

[1]
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

**Sheriffo Ceesay**


Week 5 Report

2019-06-30 Thread Sheriffo Ceesay
Dear All,

My week 5 report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

Please let me know if you have any questions or suggestions.

Thank you.


**Sheriffo Ceesay**


Week Four Report

2019-06-23 Thread Sheriffo Ceesay
All,

Week four report available at [1]

It also contains what I plan to do in the coming days.

Please let me know if you have any questions or suggestions.

[1]
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

Thank you.


**Sheriffo Ceesay**


Week 3 Report

2019-06-15 Thread Sheriffo Ceesay
All,

My week three report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

It also contains what I plan to work on next week.

Please let me know if you have any questions or suggestions.

Thank you.


**Sheriffo Ceesay**


Re: Week 2 Report and A Question

2019-06-12 Thread Sheriffo Ceesay
Hi All,

Further to my previous email, I have now updated the code to reflect the
code optimisations proposed by Alfonso. I have completely removed the
reflection approach. @Alfonso Nishikawa  , the
reason I did not use *SpecificRecordBase.get(int index)* and
*SpecificRecordBase.put(int
index, Object o)* in my first implementation was due to this comment
" //Used by DatumWriter.  Applications should not call." just before the
methods in the generated class. I have now changed the implementation to
use these two methods.

@Renato Marroquín Mogrovejo  , I forgot to
answer your question about how I run the code, please see below.

First Step: From the gora-benchmark directory execute

*mvn clean install*

Second Step:

*java -cp
.:bmstuff/core-0.1.4.jar:target/gora-benchmark-0.9-SNAPSHOT.jar:bmstuff/sources-dist-0.9-SNAPSHOT.jar:lib/*
*
* com.yahoo.ycsb.Client *
*-load *
*-db org.apache.gora.benchmark.GoraBenchmarkClient *
*-threads 10 -s *
*-P bmstuff/workloads/workloada > bmstuff/out.log*

The following switches or command line option are YCSB specific.

 com.yahoo.ycsb.Client is YCSB implementation that will load our DB class.
-load will load the database
-db Specify our benchmark implementation
-threads specify the number of client threads to start.
-P  the workload to execute, the file *bmstuff/workloads/workloada* contains
various key-value pairs e.g. number of records to count, number of records
to read and update.
> bmstuff/out.log send standard out and logs to out.log.

YCSB is highly configurable, we can pass a lot of KV pairs as a property
file or command line options. See [1] and [2] for more information. In the
end, we can write an executable script like ycsb.sh to automate the entire
benchmark process of Gora. This would range from creating the Avro files to
producing the benchmarking results.

 Please let me know if that makes sense.

Thank you.

[1] https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload
[2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Properties


**Sheriffo Ceesay**


On Wed, Jun 12, 2019 at 11:50 AM Sheriffo Ceesay 
wrote:

> Hi Renato,
>
> I will follow Alfonso's recommendations about reusing objects as much as I
> can. I will push those changes to the branch by the end of this week.
>
> To answer your questions.
>
> Yes, you are right I am using a clean cold JVM. If necessary, I can also
> have a look at warming the JVM down the line.
>
> Yes, I have tried setting *gora.hbasestore.scanner.caching* to different
> values but there was no significant difference. Also, I may be wrong but  I
> think this setting has to do with scan operation and not insert operation?
>
> As for flushing, I tried but it quickly throws an error and hence I
> commented that line of code. I think this is due to the fact that the
> insert operation inserts a single user object for each call, so calling
> dataStore.flush() within that method would mean calling flush on every
> object insertion. Is that not the case? There should be a way to track the
> progress of inserts then that can be used to call flush after N insert
> calls. So I used *gora.hbasestore.hbase.client.autoflush.enabled=true *which
> would automatically call flush at some point. However, like I mentioned in
> my previous email, enabling autoflush decreases write performance [1].
>
> [1] https://gora.apache.org/current/gora-hbase.html
>
> Thank you.
>
> **Sheriffo Ceesay**
>
>
> On Tue, Jun 11, 2019 at 10:52 PM Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hey Sheriffo,
>>
>> Cool to hear you are making progress! :) and great to see that we have
>> some numbers already! :)
>> Regarding optimization point (1), regardless that this was not he
>> cause of the issue or not, Alfonso suggestions are something we should
>> follow, many objects with a short life in java might create a
>> performance problem sooner or later. Also about your comment:
>>
>> "Also, I may be wrong but the way I understand YCSB framework is, it
>> will execute an insert operation for each user object, so I thought it
>> was right to create a user object within the insert method."
>>
>> As you pointed out, YCSB is about inserting the objects, and NOT about
>> creating them, so it doesn't matter if we reuse the objects, as long
>> as the values that we insert are actually correct. We don't want to
>> end up measuring object creation+gc. I think Alfonso's comment was
>> hinting on that direction (please feel free to correct me @Alfonso if
>> I am misunderstanding you) and I think his comments are just on the
>> spot.
>> I have some other questions regarding the numbers you sent around:
>> - are you running YCSB for each data store with warm JVM? or are thes

Re: Week 2 Report and A Question

2019-06-12 Thread Sheriffo Ceesay
Hi Renato,

I will follow Alfonso's recommendations about reusing objects as much as I
can. I will push those changes to the branch by the end of this week.

To answer your questions.

Yes, you are right I am using a clean cold JVM. If necessary, I can also
have a look at warming the JVM down the line.

Yes, I have tried setting *gora.hbasestore.scanner.caching* to different
values but there was no significant difference. Also, I may be wrong but  I
think this setting has to do with scan operation and not insert operation?

As for flushing, I tried but it quickly throws an error and hence I
commented that line of code. I think this is due to the fact that the
insert operation inserts a single user object for each call, so calling
dataStore.flush() within that method would mean calling flush on every
object insertion. Is that not the case? There should be a way to track the
progress of inserts then that can be used to call flush after N insert
calls. So I used *gora.hbasestore.hbase.client.autoflush.enabled=true *which
would automatically call flush at some point. However, like I mentioned in
my previous email, enabling autoflush decreases write performance [1].

[1] https://gora.apache.org/current/gora-hbase.html

Thank you.

**Sheriffo Ceesay**


On Tue, Jun 11, 2019 at 10:52 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hey Sheriffo,
>
> Cool to hear you are making progress! :) and great to see that we have
> some numbers already! :)
> Regarding optimization point (1), regardless that this was not he
> cause of the issue or not, Alfonso suggestions are something we should
> follow, many objects with a short life in java might create a
> performance problem sooner or later. Also about your comment:
>
> "Also, I may be wrong but the way I understand YCSB framework is, it
> will execute an insert operation for each user object, so I thought it
> was right to create a user object within the insert method."
>
> As you pointed out, YCSB is about inserting the objects, and NOT about
> creating them, so it doesn't matter if we reuse the objects, as long
> as the values that we insert are actually correct. We don't want to
> end up measuring object creation+gc. I think Alfonso's comment was
> hinting on that direction (please feel free to correct me @Alfonso if
> I am misunderstanding you) and I think his comments are just on the
> spot.
> I have some other questions regarding the numbers you sent around:
> - are you running YCSB for each data store with warm JVM? or are these
> numbers each with a clean cold JVM? I suppose the latter, right?
> - did you try setting gora.hbasestore.scanner.caching to a lower value?
> - which is the command that you are using to run/start this code?
> - did you try flushing the commits more regularly in:
>
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L142
> let's say every 1000 elements? or something like that? I mean instead
> of at the end of the 1M elements?
>
> Thanks a lot for the report Sheriffo!
>
>
> Best,
>
> Renato M.
>
> El mar., 11 jun. 2019 a las 16:12, Sheriffo Ceesay
> () escribió:
> >
> > Hello,
> >
> > I have taken a proper look at the recommendations from @Alfonso and
> @Renato and below are the outcomes.
> >
> > Failed Attempts
> > 1. Optimisation, for the insert operation, to avoid the concatenation
> issue, I have just taken the quickest route by calling the methods directly
> without reflection. Below are those calls. Note: I have moved all reusable
> codes to the init method.
> >
> >> public int insert(String table, String key, HashMap ByteIterator> values) {
> >>   user.setField0(values.get("field0").toString());
> >>   user.setField1(values.get("field1").toString());
> >>   user.setField2(values.get("field2").toString());
> >>   user.setField3(values.get("field3").toString());
> >>   user.setField4(values.get("field4").toString());
> >>   user.setField5(values.get("field5").toString());
> >>   user.setField6(values.get("field6").toString());
> >>   user.setField7(values.get("field7").toString());
> >>   user.setField8(values.get("field8").toString());
> >>   user.setField9(values.get("field9").toString());
> >>   dataStore.put(user.getUserId().toString(), user);
> >> } catch (Exception e) {
> >>   return FAILED;
> >> }
> >> return SUCCESS;
> >>   }
> >
> >
> > if the above had worked, I would have changed the code as suggeste

Re: Week 2 Report and A Question

2019-06-11 Thread Sheriffo Ceesay
Hello,

I have taken a proper look at the recommendations from @Alfonso and @Renato
and below are the outcomes.

Failed Attempts
1. Optimisation, for the insert operation, to avoid the concatenation
issue, I have just taken the quickest route by calling the methods directly
without reflection. Below are those calls. Note: I have moved all reusable
codes to the init method.

public int insert(String table, String key, HashMap
> values) {
>   user.setField0(values.get("field0").toString());
>   user.setField1(values.get("field1").toString());
>   user.setField2(values.get("field2").toString());
>   user.setField3(values.get("field3").toString());
>   user.setField4(values.get("field4").toString());
>   user.setField5(values.get("field5").toString());
>   user.setField6(values.get("field6").toString());
>   user.setField7(values.get("field7").toString());
>   user.setField8(values.get("field8").toString());
>   user.setField9(values.get("field9").toString());
>   dataStore.put(user.getUserId().toString(), user);
> } catch (Exception e) {
>   return FAILED;
> }
> return SUCCESS;
>   }
>

if the above had worked, I would have changed the code as suggested by
Alfonso. Also, I may be wrong but the way I understand YCSB framework is,
it will execute an insert operation for each user object, so I thought it
was right to create a user object within the insert method.


2. I used different config values for *-Xmx (256MB, 512MB, 1GB, 2GB)* and
even disabled GC checking using *-XX:-UseGCOverheadLimit* but they all
failed with the same GC error.

Successful Attempt -- There may be room for improvement
Using the configurations below worked but I think it is not the best for
write performance.

First, I read from [1] related to [2] that the following oneliner code
should be executed for better HBase performance when using YCSB. It
basically avoids overloading a single region server.

hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of
regionservers)
hbase(main):002:0> create 'users', 'info', {SPLITS =>
(1..n_splits).map {|i| "user#{1000+i*(-1000)/n_splits}"}}

Second, as suggested by @Renato Marroquín Mogrovejo
 , it only works when I set

*hbase.client.autoflush.default=true*

However, from [3], I found "HBase autoflushing. Enabling autoflush
decreases write performance. Available since Gora 0.2. Defaults to
disabled.". So I am of the opinion that the problem is not entirely solved.

I have done the following testing to insert 1M records into MongoDB and
HBase, so I think this may not be bad after all but more benchmarks may be
required to validate this. HBase in Gora has almost the same performance as
vanilla YCSB to benchmark it.

*Backend  Ave Time Taken (sec)*
MongoDB  ~90
HBase in Gora  ~160
HBase YCSB~160


[1] https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
[2] https://issues.apache.org/jira/browse/HBASE-4163
[3] https://gora.apache.org/current/gora-hbase.html

Comments are welcomed.

Thank you.

**Sheriffo Ceesay**


On Tue, Jun 11, 2019 at 12:04 AM Sheriffo Ceesay 
wrote:

> Hello Alfonso and Renato,
>
> Thank you for getting in touch and thanks for the detailed replies.
>
> I will have proper look at this tomorrow morning. I did some
> troubleshooting yesterday (mostly playing with Xmx and zookeeper timeout
> settings), that improved the conditions, but it did not entirely solve the
> problem. Preliminary, it seems the problem has to do with configuration or
> how HBaseStore is implemented (this may not be entirely true).
>
> I will keep you all posted whenever I thoroughly have a look at your
> suggestions.
>
> Thanks again.
>
>
> **Sheriffo Ceesay**
>
>
> On Mon, Jun 10, 2019 at 11:14 PM Alfonso Nishikawa <
> alfonso.nishik...@gmail.com> wrote:
>
>> Hi!
>>
>> My hypothesis is taht that the difference between MongoDB and HBase is
>> that
>> HBase put more stress serializing with avro. It could affect too that if
>> the HBase's test is performed after MongoDB's ones, then the GC starts
>> from
>> a "bad" situation.
>>
>> From [A] linked by @Renato, if the error was OutOfMemoryException I would
>> have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
>> even 1, but with a GC error I am not that much sure. In anycase,
>> @Sheriffo:
>> you can try this if with the optimizations still doesn't work :)
>>
>> @Renato: Thx for the links!
>>
>> Regards,
>>
>> Alfonso Nishikawa
>>
>>
>>
>> El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo

Re: Week 2 Report and A Question

2019-06-10 Thread Sheriffo Ceesay
Hello Alfonso and Renato,

Thank you for getting in touch and thanks for the detailed replies.

I will have proper look at this tomorrow morning. I did some
troubleshooting yesterday (mostly playing with Xmx and zookeeper timeout
settings), that improved the conditions, but it did not entirely solve the
problem. Preliminary, it seems the problem has to do with configuration or
how HBaseStore is implemented (this may not be entirely true).

I will keep you all posted whenever I thoroughly have a look at your
suggestions.

Thanks again.


**Sheriffo Ceesay**


On Mon, Jun 10, 2019 at 11:14 PM Alfonso Nishikawa <
alfonso.nishik...@gmail.com> wrote:

> Hi!
>
> My hypothesis is taht that the difference between MongoDB and HBase is that
> HBase put more stress serializing with avro. It could affect too that if
> the HBase's test is performed after MongoDB's ones, then the GC starts from
> a "bad" situation.
>
> From [A] linked by @Renato, if the error was OutOfMemoryException I would
> have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
> even 1, but with a GC error I am not that much sure. In anycase, @Sheriffo:
> you can try this if with the optimizations still doesn't work :)
>
> @Renato: Thx for the links!
>
> Regards,
>
> Alfonso Nishikawa
>
>
>
> El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (<
> renatoj.marroq...@gmail.com>) escribió:
>
> > @Alfonso,
> > Thank you very much for the suggestions! you are totally right about
> > all of your points! Sheriffo, please benefit from them ;)
> >
> > Also what is strange is this (although it can be optimized as Alfonso
> > pointed out) is that it works for the MongoDB backend. So I would also
> > suspect on the configuration of the Gora-HBase client. Have you taken
> > a look at [A] for example? or other Gora-HBase assumed configurations
> > [B]? Maybe there you can specify some Xmx / Xms config.
> >
> >
> > Best,
> >
> > Renato M.
> >
> > [A]
> >
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
> > [B]
> >
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml
> >
> > El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
> > () escribió:
> > >
> > > Hi again, Sheriffo.
> > >
> > > More improvements to [1] over the last email:
> > >
> > > - fields.toArray() doesn't need a full array like in [6]. You should do
> > > just fields.toArray(new String[0]), and better if you create an array
> [0]
> > > and reuse it. That call only needs the type.
> > > - I guess the class at [2] will always be the same, so you don't need
> to
> > > set it on every insert call.
> > > - The string concatenation is overkilling for the jvm on the 1M calls
> * N
> > > fields at [3] and same for [4]. Precalculate the names in a list or
> array
> > > and reuse then for the 1M*N calls.
> > > - Other optimization for [3] is, given that PersistentBase [5] exctends
> > > SpecificRecordBase, you can access the fields by index with
> > > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).
> > >
> > > [1] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
> > > [2] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
> > > [3] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
> > > [4] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
> > > [5] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
> > > [6] -
> > >
> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
> > >
> > > Let's see if with that optimizations we free the jvm memory management
> > from
> > > much stress.
> > >
> > > Regards,
> > >
> > > Alfonso Nishikawa
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > El lun., 10 j

Week 2 Report and A Question

2019-06-08 Thread Sheriffo Ceesay
Hi All,

Week 2 progress update is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

I have one question that I would like my mentors to advise on, I am still
working it but thought it would be good to report it because it is HBase
specific.

So the problem has to do with an OutOfMemory error when inserting 1M +
record in HBase.  This happens when I try to run the actual benchmark by
first loading HBase with 1 million plus records. It works perfectly for
MongoDB but not HBase

So I am assuming this problem is specific to HBase.  The stack trace is
given below.

Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
limit exceeded



at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)



at java.lang.StringCoding.encode(StringCoding.java:344)




at java.lang.String.getBytes(String.java:918)




at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)




at
org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)



at
org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)



at
org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)



at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)




at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)




at
org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)



at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)




at
com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)



at com.yahoo.ycsb.ClientThread.run(Client.java:269)

The insert implementation of the module available at
https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
GoraBenchmarkClient.java is very straight forward. I have had a brief look
at HBaseStore.java put() implementation but could not find an issue with
that.

If I solve this problem, then I will do run more workloads to verify that
the module is stable for the basic implementation. Then I will go ahead and
work on suggestions made by Renato last week.

Please let me know what your thoughts are.


Thank you.



**Sheriffo Ceesay**


Re: Week 1 Report and Some Questions

2019-06-02 Thread Sheriffo Ceesay
The code so far is available at the GitHub link below.

https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark



**Sheriffo Ceesay**


On Sun, Jun 2, 2019 at 8:34 PM Sheriffo Ceesay 
wrote:

> Hi Renato,
>
> Thanks for the detailed reply. I agree with your recommendations on the
> way forward. I will go ahead and implement the rest of the functionality
> using reflection and we can follow your recommendations on the next
> iterations.
>
> As for the backend, I am using both HBase and MongoDB and all seems well
> at the moment.
>
> I will let you all know why I push my code to GitHub.
>
> Thank you.
>
>
> **Sheriffo Ceesay**
>
>
> On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hi Sheriffo,
>>
>> Some opinions about your questions, but others are more than welcome
>> to suggest other things as well.
>>
>> Q1: Are we going to consider arbitrary field length, e.g. if we set
>> the fieldcount to 100 then we have to create the respective Avro and
>> mapping files? Currently,
>> I don't think this process is automated and may be tedious for large
>> field counts.
>> I think for the first code iteration, we should use whatever
>> fieldcount you have generated for. Ideally, we should be able to
>> invoke the Gora bean generator and generate as many fields as required
>> by the benchmark configuration.
>>
>> Q2: Second: The second problem has to do with the first one, if we
>> allow arbitrary field counts, then there has to be a mechanism to call
>> each of the set or get methods during CRUD operations. So to avoid
>> this I used Java Reflection. See the sample code below.
>> We have some options to deal with having arbitrarily number of fields.
>> 1) Use reflection as you have which might be ok for the first code
>> iteration, but if we want to have some decent performance against
>> using datastores natively (no Gora), we should go away from it.
>> 2) Do Gora class generation (and also generate the method used to
>> insert data through Gora) in a step before the benchmark starts.
>> Something like this:
>> # passing config parameters to generate Gora Beans with number of
>> required fields
>> # this should output the generate class and the method that does the
>> insertion
>> $ gora_compiler.sh --benchmark --fields_required 4
>> The output path containing the result of this should be then include
>> (or passed) as runtime dependency to the benchmark class.
>> 3) Because Gora uses Avro, we can use complex data types, e.g.,
>> arrays, maps. So we could represent number of fields as number of
>> elements inside an array. I would think that this option gives us the
>> best performance.
>> I think  we should continue with option (1) until we have the entire
>> pipeline working, and we understand how every piece fits together with
>> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
>> should do (2) which is the most general and the one that reflects how
>> people usually use Gora, and then we test with (3). I think all of
>> these steps are totally doable in our time frame as we build upon
>> previous steps.
>> The other thing that we should decide is which backend to use as there
>> are backends that are more mature than others. I'd say to use the
>> HBase backend as it is the most stable one and the one with more
>> features, and if we feel brave we can try other backends (and fix them
>> if necessary!)
>>
>>
>> Best,
>>
>> Renato M>
>>
>> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
>> () escribió:
>> >
>> > Dear Mentors,
>> >
>> > My week one report is available at
>> >
>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>> >
>> > I have also included a detailed question of and I will need your
>> guidance
>> > on that.
>> >
>> > Please let me know what your thoughts are.
>> >
>> > Thank you.
>> >
>> > **Sheriffo Ceesay**
>>
>


Re: Week 1 Report and Some Questions

2019-06-02 Thread Sheriffo Ceesay
Hi Renato,

Thanks for the detailed reply. I agree with your recommendations on the way
forward. I will go ahead and implement the rest of the functionality using
reflection and we can follow your recommendations on the next iterations.

As for the backend, I am using both HBase and MongoDB and all seems well at
the moment.

I will let you all know why I push my code to GitHub.

Thank you.


**Sheriffo Ceesay**


On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Sheriffo,
>
> Some opinions about your questions, but others are more than welcome
> to suggest other things as well.
>
> Q1: Are we going to consider arbitrary field length, e.g. if we set
> the fieldcount to 100 then we have to create the respective Avro and
> mapping files? Currently,
> I don't think this process is automated and may be tedious for large
> field counts.
> I think for the first code iteration, we should use whatever
> fieldcount you have generated for. Ideally, we should be able to
> invoke the Gora bean generator and generate as many fields as required
> by the benchmark configuration.
>
> Q2: Second: The second problem has to do with the first one, if we
> allow arbitrary field counts, then there has to be a mechanism to call
> each of the set or get methods during CRUD operations. So to avoid
> this I used Java Reflection. See the sample code below.
> We have some options to deal with having arbitrarily number of fields.
> 1) Use reflection as you have which might be ok for the first code
> iteration, but if we want to have some decent performance against
> using datastores natively (no Gora), we should go away from it.
> 2) Do Gora class generation (and also generate the method used to
> insert data through Gora) in a step before the benchmark starts.
> Something like this:
> # passing config parameters to generate Gora Beans with number of
> required fields
> # this should output the generate class and the method that does the
> insertion
> $ gora_compiler.sh --benchmark --fields_required 4
> The output path containing the result of this should be then include
> (or passed) as runtime dependency to the benchmark class.
> 3) Because Gora uses Avro, we can use complex data types, e.g.,
> arrays, maps. So we could represent number of fields as number of
> elements inside an array. I would think that this option gives us the
> best performance.
> I think  we should continue with option (1) until we have the entire
> pipeline working, and we understand how every piece fits together with
> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
> should do (2) which is the most general and the one that reflects how
> people usually use Gora, and then we test with (3). I think all of
> these steps are totally doable in our time frame as we build upon
> previous steps.
> The other thing that we should decide is which backend to use as there
> are backends that are more mature than others. I'd say to use the
> HBase backend as it is the most stable one and the one with more
> features, and if we feel brave we can try other backends (and fix them
> if necessary!)
>
>
> Best,
>
> Renato M>
>
> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
> () escribió:
> >
> > Dear Mentors,
> >
> > My week one report is available at
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >
> > I have also included a detailed question of and I will need your guidance
> > on that.
> >
> > Please let me know what your thoughts are.
> >
> > Thank you.
> >
> > **Sheriffo Ceesay**
>


Re: Basic Benchmark Module

2019-06-02 Thread Sheriffo Ceesay
Hi Renato,

The module will reside in the Gora project and I will only use YCSB as a
dependency. I think after the project, we can integrate it with YCSB as
well.

I will share the code with you all soon.

Thank you.

**Sheriffo Ceesay**


On Sun, Jun 2, 2019 at 6:24 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Sheriffo,
>
> Thanks for putting this together. It will definitely be of use.
> Could I also ask to please share the GitHub account where the
> development is happening? So we can also keep track of it.
> I will go through the document now, and leave some comments. From your
> email, I imagine that you have started implementing the classes that
> YSCB needs Gora to implement, and have you thought about where this
> module should live? I mean the YCSB-Gora integration? What are your
> thoughts on this?
>
>
> Best,
>
> Renato M.
>
> El mar., 28 may. 2019 a las 16:25, Sheriffo Ceesay
> () escribió:
> >
> > Hi Kevin and All,
> >
> > As suggested please find below the link to my design document. It is a
> > rough sketch or in progress of what I am trying to do so please let me
> know
> > if there are any suggestions or comments.
> >
> >
> https://docs.google.com/document/d/1Zn-2ZAnyiI14xZxwqUYXzhQjiAuIrc6IFMBv2BSBYZc/edit?usp=sharing
> >
> > Meanwhile, I will continue working on the other bits of the
> implementation.
> >
> > Thank you.
> >
> >
> >
> > **Sheriffo Ceesay**
> >
> >
> > On Mon, May 27, 2019 at 9:59 AM Kevin Ratnasekera <
> djkevincr1...@gmail.com>
> > wrote:
> >
> > > Hi Sheriffo,
> > >
> > > Thank you for the update. Please find the answers inline below.
> > >
> > > On Mon, May 27, 2019 at 1:23 PM Sheriffo Ceesay  >
> > > wrote:
> > >
> > >> My Mentors,
> > >>
> > >> I have started implementing thee Benchmark Module by extending YCSB.
> YCSB
> > >> supports  CRUD operations and it can load a particular database with
> > >> predefined configuration settings. I will report any issue I face as I
> > >> move
> > >> forward.
> > >>
> > > Please share some design details on the approach you have taken in dev
> > > mailing list. This will allow more community to comment on your project
> > > work.
> > >
> > >>
> > >> Secondly, what is the standard way of communicating with our Mentors,
> it
> > >> is
> > >> okay to use this mailing list? Also, I have read that we should
> create a
> > >> page in confluence [1], upload and report our progress but this is
> > >> currently not possible probably this is due to permission problem.
> > >>
> > > I recommend you to use public lists as much as possible. It creates
> more
> > > visibility for your project in Gora community. However we ( Lewis,
> Renato
> > > and myself ) can be contacted via our personal emails. I am mostly
> > > available in Google Hangouts as well.
> > > We need to follow weekly reporting model, these reports has to go into
> > > Wiki as mentioned. Please share your wiki ID so that we can grant you
> > > permissions needed.
> > >
> > >>
> > >> [1]
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Reports#space-menu-link-content
> > >>
> > >> Thank you.
> > >>
> > >> **Sheriffo Ceesay**
> > >>
> > >
>


Week 1 Report and Some Questions

2019-06-02 Thread Sheriffo Ceesay
Dear Mentors,

My week one report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

I have also included a detailed question of and I will need your guidance
on that.

Please let me know what your thoughts are.

Thank you.

**Sheriffo Ceesay**


Re: Wiki permission for GSoC proposals/reporting

2019-05-30 Thread Sheriffo Ceesay
Hi Kevin,

Thanks. This is now working. I have added my proposal and the design
document under reports.

Cheers.

**Sheriffo Ceesay**


On Thu, May 30, 2019 at 10:17 AM Kevin Ratnasekera 
wrote:

> Hi Gavin,
>
> Thank you for the quick response. I have now added all users to gora wiki
> space, I think that should resolve the issue.
>
> Regards
> Kevin
>
> On Thu, May 30, 2019 at 2:27 PM Gavin McDonald  wrote:
>
> > Hi,
> >
> > Confluence page restrictions are designed to "restrict" access to a
> > sub-set of people that _already_ have edit access to the rest of the wiki
> > space in general.
> >
> > i..e persons bill,ben,bob,sally have space edit access
> > the restrictions to pages can be given to bill, ben, sally but not bob ,
> > etc.
> >
> > The restrictions can not open up edit access to those that dont already
> > have it.
> >
> > HTH
> >
> > Gav...
> >
> >
> > On Thu, May 30, 2019 at 9:53 AM Kevin Ratnasekera <
> djkevincr1...@gmail.com>
> > wrote:
> >
> >> Hi Infra@apache,
> >>
> >> We need four users to be added and granted edit permissions to following
> >> wiki pages. However users are still unable to make edits to the pages
> even
> >> though they are listed as users with edit rights in the page
> restrictions.
> >> Is there something we have missed or any alternate way to add these
> users?
> >> Can you please have a look?
> >>
> >> Users :
> >> John Mora
> >> Lahiru Jayasekera
> >> Xavier Sumba
> >> Sheriffo Ceesay
> >>
> >> [1] https://cwiki.apache.org/confluence/display/GORA/GSoC+2019
> >> [2]
> https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Proposals
> >> [3] https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Reports
> >>
> >> Regards
> >> Kevin
> >>
> >> On Thu, May 30, 2019 at 11:59 AM Kevin Ratnasekera <
> >> djkevincr1...@gmail.com> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Please check whether I have given access to the correct account.
> >>> [1][2][3] As I can see, your names are listed under the page's
> permissions
> >>> with edit rights. If you are still facing the issue, we can ask for
> some
> >>> help from infra@apache.
> >>>
> >>> [1] https://drive.google.com/file/d/1CW-LEkIw5Go8f7eUAnbBzNTsW47VQXSZ
> >>> [2] https://drive.google.com/file/d/1RgfdGBZKFhymu6pSu83cqzvnp6LXB0EB
> >>> [3] https://drive.google.com/file/d/1TQuJyzDVMK_iu0ifWLjCA134xZrLCFVF
> >>>
> >>> Regards
> >>> Kevin
> >>>
> >>> On Wed, May 29, 2019 at 9:09 PM John Mora 
> wrote:
> >>>
> >>>> Hi all, Same here.
> >>>>
> >>>> Best,
> >>>> John
> >>>>
> >>>> El mié., 29 may. 2019 a las 6:29, Lahiru Jayasekera (<
> >>>> mlpjayasek...@gmail.com>) escribió:
> >>>>
> >>>>> Hi,
> >>>>> Same here
> >>>>>
> >>>>> On Wed, May 29, 2019 at 3:51 AM FRANCISCO XAVIER SUMBA TORAL <
> >>>>> xavier.sumb...@ucuenca.edu.ec> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Same here...
> >>>>>>
> >>>>>> Xavier
> >>>>>>
> >>>>>> On May 28, 2019, at 15:59, Sheriffo Ceesay 
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi Kevin,
> >>>>>>
> >>>>>> Thank for the email. I have tried all the three links but I don't
> >>>>>> have edit permission to any of them.
> >>>>>>
> >>>>>> Thank you.
> >>>>>>
> >>>>>> **Sheriffo Ceesay**
> >>>>>>
> >>>>>>
> >>>>>> On Tue, May 28, 2019 at 7:46 PM Kevin Ratnasekera <
> >>>>>> djkevincr1...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> Please check whether you have edit permissions to following wiki
> >>>>>>> pages. [1][2][3] Please put your proposal and you should be adding
> your
> >>>>>>> project weekly reports to the relevant pages. At the end of each
> 

Re: Wiki permission for GSoC proposals/reporting

2019-05-28 Thread Sheriffo Ceesay
Hi Kevin,

Thank for the email. I have tried all the three links but I don't have edit
permission to any of them.

Thank you.

**Sheriffo Ceesay**


On Tue, May 28, 2019 at 7:46 PM Kevin Ratnasekera 
wrote:

> Hi all,
>
> Please check whether you have edit permissions to following wiki pages.
> [1][2][3] Please put your proposal and you should be adding your project
> weekly reports to the relevant pages. At the end of each week you should be
> emailing your mentors with link to added report. That way we can measure
> your progress and address potential issues. Please consider this to be
> extremely important.
>
> [1] https://cwiki.apache.org/confluence/display/GORA/GSoC+2019
> [2] https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Proposals
> [3] https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Reports
>
> Regards
> Kevin
>


Re: Basic Benchmark Module

2019-05-28 Thread Sheriffo Ceesay
Hi Kevin and All,

As suggested please find below the link to my design document. It is a
rough sketch or in progress of what I am trying to do so please let me know
if there are any suggestions or comments.

https://docs.google.com/document/d/1Zn-2ZAnyiI14xZxwqUYXzhQjiAuIrc6IFMBv2BSBYZc/edit?usp=sharing

Meanwhile, I will continue working on the other bits of the implementation.

Thank you.



**Sheriffo Ceesay**


On Mon, May 27, 2019 at 9:59 AM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> Thank you for the update. Please find the answers inline below.
>
> On Mon, May 27, 2019 at 1:23 PM Sheriffo Ceesay 
> wrote:
>
>> My Mentors,
>>
>> I have started implementing thee Benchmark Module by extending YCSB. YCSB
>> supports  CRUD operations and it can load a particular database with
>> predefined configuration settings. I will report any issue I face as I
>> move
>> forward.
>>
> Please share some design details on the approach you have taken in dev
> mailing list. This will allow more community to comment on your project
> work.
>
>>
>> Secondly, what is the standard way of communicating with our Mentors, it
>> is
>> okay to use this mailing list? Also, I have read that we should create a
>> page in confluence [1], upload and report our progress but this is
>> currently not possible probably this is due to permission problem.
>>
> I recommend you to use public lists as much as possible. It creates more
> visibility for your project in Gora community. However we ( Lewis, Renato
> and myself ) can be contacted via our personal emails. I am mostly
> available in Google Hangouts as well.
> We need to follow weekly reporting model, these reports has to go into
> Wiki as mentioned. Please share your wiki ID so that we can grant you
> permissions needed.
>
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Reports#space-menu-link-content
>>
>> Thank you.
>>
>> **Sheriffo Ceesay**
>>
>


Re: Basic Benchmark Module

2019-05-28 Thread Sheriffo Ceesay
Hi All,

Any success adding the mentees to the Gora confluence page. I can't edit
the page or upload any document.

Thank you.


* *Sheriffo Ceesay**


On Mon, May 27, 2019 at 10:16 AM Sheriffo Ceesay 
wrote:

> Hi Kevin,
>
> Thanks for the reply, I am basically following the design in the original
> proposal, I can, however, provide a more detailed version of that.
>
> As for the WIKI id, do you mean our username? if that is the case then
> mine is sneceesay77.
>
> Thank you.
>
> **Sheriffo Ceesay**
>
>
> On Mon, May 27, 2019 at 9:59 AM Kevin Ratnasekera 
> wrote:
>
>> Hi Sheriffo,
>>
>> Thank you for the update. Please find the answers inline below.
>>
>> On Mon, May 27, 2019 at 1:23 PM Sheriffo Ceesay 
>> wrote:
>>
>>> My Mentors,
>>>
>>> I have started implementing thee Benchmark Module by extending YCSB. YCSB
>>> supports  CRUD operations and it can load a particular database with
>>> predefined configuration settings. I will report any issue I face as I
>>> move
>>> forward.
>>>
>> Please share some design details on the approach you have taken in dev
>> mailing list. This will allow more community to comment on your project
>> work.
>>
>>>
>>> Secondly, what is the standard way of communicating with our Mentors, it
>>> is
>>> okay to use this mailing list? Also, I have read that we should create a
>>> page in confluence [1], upload and report our progress but this is
>>> currently not possible probably this is due to permission problem.
>>>
>> I recommend you to use public lists as much as possible. It creates more
>> visibility for your project in Gora community. However we ( Lewis, Renato
>> and myself ) can be contacted via our personal emails. I am mostly
>> available in Google Hangouts as well.
>> We need to follow weekly reporting model, these reports has to go into
>> Wiki as mentioned. Please share your wiki ID so that we can grant you
>> permissions needed.
>>
>>>
>>> [1]
>>>
>>> https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Reports#space-menu-link-content
>>>
>>> Thank you.
>>>
>>> **Sheriffo Ceesay**
>>>
>>


Re: Basic Benchmark Module

2019-05-27 Thread Sheriffo Ceesay
Hi Kevin,

Thanks for the reply, I am basically following the design in the original
proposal, I can, however, provide a more detailed version of that.

As for the WIKI id, do you mean our username? if that is the case then mine
is sneceesay77.

Thank you.

**Sheriffo Ceesay**


On Mon, May 27, 2019 at 9:59 AM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> Thank you for the update. Please find the answers inline below.
>
> On Mon, May 27, 2019 at 1:23 PM Sheriffo Ceesay 
> wrote:
>
>> My Mentors,
>>
>> I have started implementing thee Benchmark Module by extending YCSB. YCSB
>> supports  CRUD operations and it can load a particular database with
>> predefined configuration settings. I will report any issue I face as I
>> move
>> forward.
>>
> Please share some design details on the approach you have taken in dev
> mailing list. This will allow more community to comment on your project
> work.
>
>>
>> Secondly, what is the standard way of communicating with our Mentors, it
>> is
>> okay to use this mailing list? Also, I have read that we should create a
>> page in confluence [1], upload and report our progress but this is
>> currently not possible probably this is due to permission problem.
>>
> I recommend you to use public lists as much as possible. It creates more
> visibility for your project in Gora community. However we ( Lewis, Renato
> and myself ) can be contacted via our personal emails. I am mostly
> available in Google Hangouts as well.
> We need to follow weekly reporting model, these reports has to go into
> Wiki as mentioned. Please share your wiki ID so that we can grant you
> permissions needed.
>
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Reports#space-menu-link-content
>>
>> Thank you.
>>
>> **Sheriffo Ceesay**
>>
>


Basic Benchmark Module

2019-05-27 Thread Sheriffo Ceesay
My Mentors,

I have started implementing thee Benchmark Module by extending YCSB. YCSB
supports  CRUD operations and it can load a particular database with
predefined configuration settings. I will report any issue I face as I move
forward.

Secondly, what is the standard way of communicating with our Mentors, it is
okay to use this mailing list? Also, I have read that we should create a
page in confluence [1], upload and report our progress but this is
currently not possible probably this is due to permission problem.

[1]
https://cwiki.apache.org/confluence/display/GORA/GSoC+2019+Reports#space-menu-link-content

Thank you.

**Sheriffo Ceesay**


[jira] [Updated] (GORA-616) Multiple slf4j conflict issue

2019-05-21 Thread Sheriffo Ceesay (JIRA)


 [ 
https://issues.apache.org/jira/browse/GORA-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sheriffo Ceesay updated GORA-616:
-
Summary: Multiple slf4j conflict issue   (was: Multiple SLJ4J conflic issue 
)

> Multiple slf4j conflict issue 
> --
>
> Key: GORA-616
> URL: https://issues.apache.org/jira/browse/GORA-616
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-tutorial
>Affects Versions: 0.9
>    Reporter: Sheriffo Ceesay
>Priority: Major
>  Labels: build, test
> Fix For: 0.8, 0.9
>
>
> Testing the 0.9 release using the gora-tutorial module, invoking the line 
> below gives you some warnings and error. 
>  
> bin/gora logmanager -parse gora-tutorial/src/main/resources/access.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/log4j-slf4j-impl-2.11.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> [https://logging.apache.org/log4j/2.x/manual/configuration.html] for 
> instructions on how to configure Log4j 2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GORA-616) Multiple SLJ4J conflic issue

2019-05-21 Thread Sheriffo Ceesay (JIRA)


 [ 
https://issues.apache.org/jira/browse/GORA-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sheriffo Ceesay updated GORA-616:
-
Summary: Multiple SLJ4J conflic issue   (was: Multiple SLJ4 conflic issue )

> Multiple SLJ4J conflic issue 
> -
>
> Key: GORA-616
> URL: https://issues.apache.org/jira/browse/GORA-616
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-tutorial
>Affects Versions: 0.9
>    Reporter: Sheriffo Ceesay
>Priority: Major
>  Labels: build, test
> Fix For: 0.8, 0.9
>
>
> Testing the 0.9 release using the gora-tutorial module, invoking the line 
> below gives you some warnings and error. 
>  
> bin/gora logmanager -parse gora-tutorial/src/main/resources/access.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/log4j-slf4j-impl-2.11.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> ERROR StatusLogger No Log4j 2 configuration file found. Using default 
> configuration (logging only errors to the console), or user programmatically 
> provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
> internal initialization logging. See 
> [https://logging.apache.org/log4j/2.x/manual/configuration.html] for 
> instructions on how to configure Log4j 2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GORA-563) Support Hadoop Combiner

2019-05-21 Thread Sheriffo Ceesay (JIRA)


 [ 
https://issues.apache.org/jira/browse/GORA-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sheriffo Ceesay updated GORA-563:
-
Component/s: (was: gora-core)
 gora-tutorial

> Support Hadoop Combiner
> ---
>
> Key: GORA-563
> URL: https://issues.apache.org/jira/browse/GORA-563
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-tutorial
>Affects Versions: 0.9
>Reporter: Alfonso Nishikawa
>Priority: Minor
>
> It seems the reducer class is not being used as a Combiner. It would be good 
> to have that feature.
> [From this StackOverflow 
> question|https://stackoverflow.com/questions/54003803/combiner-function-in-apache-hadoop-with-gora].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (GORA-563) Support Hadoop Combiner

2019-05-21 Thread Sheriffo Ceesay (JIRA)


 [ 
https://issues.apache.org/jira/browse/GORA-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sheriffo Ceesay updated GORA-563:
-
Component/s: (was: gora-tutorial)
 gora-core

> Support Hadoop Combiner
> ---
>
> Key: GORA-563
> URL: https://issues.apache.org/jira/browse/GORA-563
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-core
>Affects Versions: 0.9
>Reporter: Alfonso Nishikawa
>Priority: Minor
>
> It seems the reducer class is not being used as a Combiner. It would be good 
> to have that feature.
> [From this StackOverflow 
> question|https://stackoverflow.com/questions/54003803/combiner-function-in-apache-hadoop-with-gora].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GORA-616) Multiple SLJ4 conflic issue

2019-05-21 Thread Sheriffo Ceesay (JIRA)
Sheriffo Ceesay created GORA-616:


 Summary: Multiple SLJ4 conflic issue 
 Key: GORA-616
 URL: https://issues.apache.org/jira/browse/GORA-616
 Project: Apache Gora
  Issue Type: Improvement
  Components: gora-tutorial
Affects Versions: 0.9
Reporter: Sheriffo Ceesay
 Fix For: 0.9, 0.8


Testing the 0.9 release using the gora-tutorial module, invoking the line below 
gives you some warnings and error. 
 
bin/gora logmanager -parse gora-tutorial/src/main/resources/access.log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/log4j-slf4j-impl-2.11.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No Log4j 2 configuration file found. Using default 
configuration (logging only errors to the console), or user programmatically 
provided configurations. Set system property 'log4j2.debug' to show Log4j 2 
internal initialization logging. See 
[https://logging.apache.org/log4j/2.x/manual/configuration.html] for 
instructions on how to configure Log4j 2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GORA-615) Update gora-tutorial pom to include mongodb

2019-05-21 Thread Sheriffo Ceesay (JIRA)
Sheriffo Ceesay created GORA-615:


 Summary: Update gora-tutorial pom to include mongodb
 Key: GORA-615
 URL: https://issues.apache.org/jira/browse/GORA-615
 Project: Apache Gora
  Issue Type: Improvement
  Components: gora-tutorial
Affects Versions: 0.9
Reporter: Sheriffo Ceesay
 Fix For: 0.9


gora-turorial module does not include gora-mongodb module. Would be good have 
this out of the box. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (GORA-615) Update gora-tutorial pom to include mongodb

2019-05-21 Thread Sheriffo Ceesay (JIRA)


 [ 
https://issues.apache.org/jira/browse/GORA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sheriffo Ceesay reassigned GORA-615:


Assignee: Sheriffo Ceesay

> Update gora-tutorial pom to include mongodb
> ---
>
> Key: GORA-615
> URL: https://issues.apache.org/jira/browse/GORA-615
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-tutorial
>Affects Versions: 0.9
>    Reporter: Sheriffo Ceesay
>Assignee: Sheriffo Ceesay
>Priority: Minor
>  Labels: easyfix
> Fix For: 0.9
>
>
> gora-turorial module does not include gora-mongodb module. Would be good have 
> this out of the box. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] RC01 for Apache Gora 0.9 Release

2019-05-21 Thread Sheriffo Ceesay
I am currently testing the 0.9 release using the gora-tutorial module. The
error below is thrown which leads to no output from the LogManager.

bin/gora logmanager -parse gora-tutorial/src/main/resources/access.log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/log4j-slf4j-impl-2.11.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/user/apache-gora-0.9/gora-tutorial/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No Log4j 2 configuration file found. Using default
configuration (logging only errors to the console), or user
programmatically provided configurations. Set system property
'log4j2.debug' to show Log4j 2 internal initialization logging. See
https://logging.apache.org/log4j/2.x/manual/configuration.html for
instructions on how to configure Log4j 2

Kevin and I are currently investigating this.


Thank you.



**Sheriffo Ceesay**


On Tue, May 21, 2019 at 2:38 AM carlos muñoz  wrote:

> +1 Release this package as Apache Gora 0.9
>
> Regards,
> Carlos
>
> El vie., 10 may. 2019 a las 11:46, Madhawa Kasun Gunasekara (<
> madhaw...@gmail.com>) escribió:
>
> >  +1 Release this package as Apache Gora 0.9
> >
> > Tested Cassandra and Solr module.
> >
> > Thanks,
> > Madhawa
> >
> >
> > On Wed, May 8, 2019 at 11:59 AM Kevin Ratnasekera <
> djkevincr1...@gmail.com
> > >
> > wrote:
> >
> > > Hi all,
> > >
> > > This is the vote on first release candidate (RC01) for next Apache
> > >
> > > Gora release version 0.9.
> > >
> > > The VOTE will remain open for at least 72 hours.
> > >
> > > [ ] +1 Release this package as Apache Gora 0.9 ...
> > >
> > > [ ] -1 Do not release this package because …
> > >
> > > 38 tickets were resolved and release report is available here :
> > >
> > > https://s.apache.org/0.9GoraReleaseNotes
> > >
> > > Source release artifacts are available here :
> > >
> > > https://dist.apache.org/repos/dist/dev/gora/apache-gora-0.9-RC01/
> > >
> > > Staging repository is available here :
> > >
> > > https://repository.apache.org/content/repositories/orgapachegora-1010
> > >
> > > Release candidate is signed through the key A3E66AC7 which is available
> > > here :
> > >
> > > https://dist.apache.org/repos/dist/dev/gora/KEYS
> > >
> > > The release candidate is based on the sources tagged with
> > apache-gora-0.9:
> > >
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=gora.git;a=tag;h=61b9a928a076a52a39f1b0771798bbae4742fffc
> > >
> > > and is based on the following commit id:
> > >
> > > 4b27f2bfebee1a77f9b957929c956ea1b509f9fb
> > >
> > > Thank you to everyone that contributed to Apache Gora 0.9 and who are
> > able
> > > to vote on the release candidate.
> > >
> > > PS: My vote is `[x] +1 Release this package as Apache Gora 0.9`.
> > >
> > > Regards
> > >
> > > Kevin
> > >
> > >
> >
>


Fwd: GSoC 2019: Congratulations, your proposal with The Apache Software Foundation has been accepted!

2019-05-07 Thread Sheriffo Ceesay
All,

FYI, My proposal has been accepted by GSoC. Thank you all for having
confidence in me. I will endeavour by all means to deliver my best in this
exciting project.

Thank you.


**Sheriffo Ceesay**


-- Forwarded message -
From: Google Summer of Code 
Date: Mon, May 6, 2019 at 10:19 PM
Subject: GSoC 2019: Congratulations, your proposal with The Apache Software
Foundation has been accepted!
To: 


[image: Google Summer of Code]

Hi Sheriffo Ceesay,

Your proposal Benchmark Module for Apache Gora
<https://summerofcode.withgoogle.com/dashboard/student/proposal/6241043513606144/>
has been accepted!

Welcome to GSoC 2019!

We look forward to seeing the great things you will accomplish this summer
with The Apache Software Foundation.

The next thing you need to do is read the Information for Accepted Students
<https://developers.google.com/open-source/gsoc/help/accepted-students>. It
contains important information you need to know about your participation in
GSoC 2019.

You will receive another email in the next few days with information about
your stipend.

If you have any questions, please email the Google Summer of Code support
team at gsoc-supp...@google.com.

Have a great summer!

-*Google Summer of Code team*

This email was sent to sneceesa...@gmail.com.

You are receiving this email because of your participation in Google Summer
of Code 2019.
https://summerofcode.withgoogle.com

To leave the program and stop receiving all emails, you can go to your
profile <https://summerofcode.withgoogle.com/dashboard/profile/> and
request deletion of your program profile.

For any questions, please contact gsoc-supp...@google.com. Replies to this
message go to an unmonitored mailbox.

© 2019 Google LLC, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA


Re: Apache Gora Benchmark Module Draft Proposal

2019-04-08 Thread Sheriffo Ceesay
Thank you. I will do the final submission later today.

Thank you.


**Sheriffo Ceesay**


On Mon, Apr 8, 2019 at 5:40 PM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> The proposal looks very good. I have added one small comment. Please make
> sure you do the final submission before the deadline.
>
> Good luck with your submission :)
>
> Regards
> Kevin
>
> On Sun, Mar 24, 2019 at 2:14 AM Sheriffo Ceesay 
> wrote:
>
> > Hi All,
> >
> > As advised by Kevin Ratnasekera and Furkan Kamaci, please find below the
> > link to the draft proposal for Gora benchmark module.
> >
> > Please let me if you have any comments to improve the document. (Don't
> > worry, I have another copy.)
> >
> >
> >
> https://docs.google.com/document/d/1djelY4yVwTuWPA310E_JBinOPnt5PJh3x67z0ZxgBLg/edit?usp=sharing
> >
> >
> > Please add your comments using the sidebar comment functionality if
> > possible.
> >
> > Thank you.
> >
> >
> > **Sheriffo Ceesay**
> >
>


[jira] [Commented] (GORA-564) Remove deprecated method usages of HBase module after upgrading to 2

2019-04-07 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811701#comment-16811701
 ] 

Sheriffo Ceesay commented on GORA-564:
--

I have just submitted a pull request for this [~djkevincr], can you please have 
a look. Thanks. 

> Remove deprecated method usages of HBase module after upgrading to 2
> 
>
> Key: GORA-564
> URL: https://issues.apache.org/jira/browse/GORA-564
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-hbase
>Affects Versions: 0.8
>Reporter: Kevin Ratnasekera
>Priority: Major
>
> Following can be seen on the build
> [WARNING] 
> /home/djkevincr/apache_contributions/gora/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStoreMetadataAnalyzer.java:[27,30]
>  [deprecation] HTableDescriptor in org.apache.hadoop.hbase has been deprecated
> [WARNING] 
> /home/djkevincr/apache_contributions/gora/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java:[629,11]
>  [deprecation] setTimeStamp(long) in Get has been deprecated
> [WARNING] 
> /home/djkevincr/apache_contributions/gora/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStoreMetadataAnalyzer.java:[69,12]
>  [deprecation] HTableDescriptor in org.apache.hadoop.hbase has been deprecated
> [WARNING] 
> /home/djkevincr/apache_contributions/gora/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStoreMetadataAnalyzer.java:[69,57]
>  [deprecation] getTableDescriptor(TableName) in Admin has been deprecated
> [WARNING] 
> /home/djkevincr/apache_contributions/gora/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStoreMetadataAnalyzer.java:[71,82]
>  [deprecation] getColumnFamilies() in HTableDescriptor has been deprecated
> [WARNING] 
> /home/djkevincr/apache_contributions/gora/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStoreMetadataAnalyzer.java:[71,108]
>  [deprecation] HColumnDescriptor in org.apache.hadoop.hbase has been 
> deprecated



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-31 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806253#comment-16806253
 ] 

Sheriffo Ceesay commented on GORA-532:
--

All, 

Thanks for the wonderful feedbacks, they were really useful. FYI, I have 
updated the Google doc to address the various comments. 

[~lewismc] , with regards to obtaining compute resources to test scalability, I 
am not really sure, but I have an 8-node local cluster that I use for my work 
but I may have to talk to my supervisor to see if I am permitted use it in this 
work. For the ApacheCon EU 2019, great idea, I will surely subscribe. Great, I 
am loving it here in Scotland and I can understand why you are tad homesick.:) 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-30 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805993#comment-16805993
 ] 

Sheriffo Ceesay commented on GORA-532:
--

Hi Lewis,


You should now be able to access the shared document.

Thank you.

On Sat, 30 Mar 2019 at 19:52, Lewis John McGibbney (JIRA) 

-- 

**Sheriffo Ceesay**


> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-26 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801797#comment-16801797
 ] 

Sheriffo Ceesay commented on GORA-532:
--

[~kamaci] and [~djkevincr], I have made some progress on my proposal and I 
would like to know if you have any comments? 

I am planning to submit my proposal before end of the week so it will be 
helpful if a potential mentor can have a look at the document and provide 
comments. 

Link to doc : 
https://docs.google.com/document/d/1djelY4yVwTuWPA310E_JBinOPnt5PJh3x67z0ZxgBLg/edit
 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Gora Benchmark

2019-03-26 Thread Sheriffo Ceesay
I have updated the Benchmark Module after some suggestions from Renato. So
basically, the suggestion was to consider extending YCSB to include Gora
since YCSB already have an implementation of other KV stores.

So it will be great if a potential mentor could have look at this and give
me some feedback. We are currently in the proposal submission period of
GSoC timeline, so any comment on the document will really help.

Please find below link to the shared Google doc.

https://docs.google.com/document/d/1djelY4yVwTuWPA310E_JBinOPnt5PJh3x67z0ZxgBLg/edit



**Sheriffo Ceesay**


On Mon, Mar 25, 2019 at 12:30 PM Sheriffo Ceesay 
wrote:

> Hi Renato,
>
> Thanks for the reply and the comments on the Google doc.
>
> I think, adding Gora to YCBS framework will be the best approach. Like, I
> mentioned in the shared doc, I will dig more into this and update the
> proposal accordingly.
>
> Thank you.
>
>
> **Sheriffo Ceesay**
>
>
> On Mon, Mar 25, 2019 at 12:05 PM Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hey Sheriffo,
>>
>> Thanks for sharing this. I went quickly over it, and it looks good
>> overall.
>> One question I have is the one I left on the proposal as well. The
>> proposal
>> is about implementing a benckmarking module but why aren't we
>> using/integrating with something like YCSB?
>>
>> I am asking this because it has a few benefits:
>> - Most of the operations one would be interested in kv-stores are already
>> modeled by YCSB (as you know)
>> - With this we would already get support for most key-value stores and we
>> wouldn't have to implement it(or support it) later on.
>> - We get a benchmark module that is already accepted and understood by
>> people using key-value stores.
>>
>> The resulting deliverables could be the integration (adding Gora to YCSB,
>> the module could live in Gora and also could live in YCSB if they want to
>> take it), and the scripts to run it.
>> What do you guys think?
>>
>>
>> Best,
>>
>> Renato M.
>>
>> El dom., 24 mar. 2019 a las 13:05, Sheriffo Ceesay (<
>> sneceesa...@gmail.com>)
>> escribió:
>>
>> > Hi Renato,
>> >
>> > Thanks for the reply. As far as I am concerned all options are on the
>> > table. I have shared my draft project proposal with the dev email list
>> for
>> > comments. I will visit it again and see how best your ideas can be
>> added to
>> > the implementation.
>> >
>> > Below is the Google doc file, please feel free to add comments.
>> >
>> >
>> >
>> https://docs.google.com/document/d/1djelY4yVwTuWPA310E_JBinOPnt5PJh3x67z0ZxgBLg/edit?usp=sharing
>> >
>> > Thank you.
>> >
>> > **Sheriffo Ceesay**
>> >
>> >
>> > On Sun, Mar 24, 2019 at 11:08 AM Renato Marroquín Mogrovejo <
>> > renatoj.marroq...@gmail.com> wrote:
>> >
>> > > Hi Sheriffo,
>> > >
>> > > Thanks for your interest in Gora and in this project.
>> > > We have discussed this a bit already and what the important bit is to
>> > > figure out Gora's overhead compared to using just the kv stores.
>> > > Obviously, we incurr in overheads, but it'd be interesting to know
>> where
>> > > exactly (most likely serialization) and not just say how slow Gora is.
>> > > Ideally, one could fix the easy performance bugs but this might be
>> out of
>> > > the scope, but anyway, that would be nice.
>> > > Another idea would be to actually get the final benchmark run as part
>> of
>> > > CI? So we know how every change impacts performance.
>> > >
>> > >
>> > > Best,
>> > >
>> > > Renato M.
>> > > El mié., 20 mar. 2019 a las 17:15, sneceesa...@gmail.com (<
>> > > sneceesa...@gmail.com>) escribió:
>> > > >
>> > > >
>> > > >
>> > > > On 2017/12/23 20:17:12, Furkan KAMACI 
>> wrote:
>> > > > > Hi Fellows,
>> > > > >
>> > > > > As you know that our project is defined as:
>> > > > >
>> > > > > "*The Apache Gora™ open source framework provides an in-memory
>> data
>> > > model
>> > > > > and persistence for big data.*[1]"
>> > > > >
>> > > > > I believe that Apache Gora is a special project and it touches
>> many
>> > > > > projects. I always wond

Re: Apache Gora Benchmark

2019-03-25 Thread Sheriffo Ceesay
Hi Renato,

Thanks for the reply and the comments on the Google doc.

I think, adding Gora to YCBS framework will be the best approach. Like, I
mentioned in the shared doc, I will dig more into this and update the
proposal accordingly.

Thank you.


**Sheriffo Ceesay**


On Mon, Mar 25, 2019 at 12:05 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hey Sheriffo,
>
> Thanks for sharing this. I went quickly over it, and it looks good overall.
> One question I have is the one I left on the proposal as well. The proposal
> is about implementing a benckmarking module but why aren't we
> using/integrating with something like YCSB?
>
> I am asking this because it has a few benefits:
> - Most of the operations one would be interested in kv-stores are already
> modeled by YCSB (as you know)
> - With this we would already get support for most key-value stores and we
> wouldn't have to implement it(or support it) later on.
> - We get a benchmark module that is already accepted and understood by
> people using key-value stores.
>
> The resulting deliverables could be the integration (adding Gora to YCSB,
> the module could live in Gora and also could live in YCSB if they want to
> take it), and the scripts to run it.
> What do you guys think?
>
>
> Best,
>
> Renato M.
>
> El dom., 24 mar. 2019 a las 13:05, Sheriffo Ceesay ( >)
> escribió:
>
> > Hi Renato,
> >
> > Thanks for the reply. As far as I am concerned all options are on the
> > table. I have shared my draft project proposal with the dev email list
> for
> > comments. I will visit it again and see how best your ideas can be added
> to
> > the implementation.
> >
> > Below is the Google doc file, please feel free to add comments.
> >
> >
> >
> https://docs.google.com/document/d/1djelY4yVwTuWPA310E_JBinOPnt5PJh3x67z0ZxgBLg/edit?usp=sharing
> >
> > Thank you.
> >
> > **Sheriffo Ceesay**
> >
> >
> > On Sun, Mar 24, 2019 at 11:08 AM Renato Marroquín Mogrovejo <
> > renatoj.marroq...@gmail.com> wrote:
> >
> > > Hi Sheriffo,
> > >
> > > Thanks for your interest in Gora and in this project.
> > > We have discussed this a bit already and what the important bit is to
> > > figure out Gora's overhead compared to using just the kv stores.
> > > Obviously, we incurr in overheads, but it'd be interesting to know
> where
> > > exactly (most likely serialization) and not just say how slow Gora is.
> > > Ideally, one could fix the easy performance bugs but this might be out
> of
> > > the scope, but anyway, that would be nice.
> > > Another idea would be to actually get the final benchmark run as part
> of
> > > CI? So we know how every change impacts performance.
> > >
> > >
> > > Best,
> > >
> > > Renato M.
> > > El mié., 20 mar. 2019 a las 17:15, sneceesa...@gmail.com (<
> > > sneceesa...@gmail.com>) escribió:
> > > >
> > > >
> > > >
> > > > On 2017/12/23 20:17:12, Furkan KAMACI 
> wrote:
> > > > > Hi Fellows,
> > > > >
> > > > > As you know that our project is defined as:
> > > > >
> > > > > "*The Apache Gora™ open source framework provides an in-memory data
> > > model
> > > > > and persistence for big data.*[1]"
> > > > >
> > > > > I believe that Apache Gora is a special project and it touches many
> > > > > projects. I always wonder the performance of NoSQL DBs as
> individual
> > > and
> > > > > accessed via Apache Gora.
> > > > >
> > > > > I think that we should make a benchmark and publish it, and
> Yahoo!’s
> > > Cloud
> > > > > Serving Benchmark (YCSB) [2] is the most suitable tool for such a
> > > purpose.
> > > > > I found a recent research about Object-NoSQL Database Mapper (ONDM)
> > > > > benchmark [3] which includes Apache Gora and they have produced the
> > > > > benchmark source code as ASF 2.0 licensed [4].
> > > > >
> > > > > Here is an example from Apache Accumulo which is based on YCSB too
> > [5].
> > > > >
> > > > > What do you think about it? Who wants to join that work apart from
> > me?
> > > > >
> > > > > Kind Regards,
> > > > > Furkan KAMACI
> > > > >
> > > > >
> > > > > [1] https://gora.apache.org
> > > > > [2] Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R.
> > > Benchmarking
> > > > > cloud serving systems with YCSB. In: Proceedings of the 1st ACM
> > > symposium
> > > > > on Cloud computing - SoCC ’10. Association for Computing Machinery
> > > (ACM):
> > > > > 2010. p. 143–154, doi:10.1145/1807128.1807152.
> > > > > http://dx.doi.org/10.1145/1807128.1807152.
> > > > > [3] https://doi.org/10.1186/s13174-016-0052-x
> > > > > [4] https://github.com/vreniers/ONDM-Benchmarker
> > > > > [5]
> https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > > >
> > > >
> > > > Hi All, I was advised by Kevin Ratnasekera to start or reignite this
> > > discussion. I am currently going over the documentation, installation
> and
> > > familiarising myself with the code base. Any good pointers here will be
> > > helpful.
> > >
> >
>


Re: Apache Gora Benchmark

2019-03-24 Thread Sheriffo Ceesay
Hi Renato,

Thanks for the reply. As far as I am concerned all options are on the
table. I have shared my draft project proposal with the dev email list for
comments. I will visit it again and see how best your ideas can be added to
the implementation.

Below is the Google doc file, please feel free to add comments.

https://docs.google.com/document/d/1djelY4yVwTuWPA310E_JBinOPnt5PJh3x67z0ZxgBLg/edit?usp=sharing

Thank you.

**Sheriffo Ceesay**


On Sun, Mar 24, 2019 at 11:08 AM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Sheriffo,
>
> Thanks for your interest in Gora and in this project.
> We have discussed this a bit already and what the important bit is to
> figure out Gora's overhead compared to using just the kv stores.
> Obviously, we incurr in overheads, but it'd be interesting to know where
> exactly (most likely serialization) and not just say how slow Gora is.
> Ideally, one could fix the easy performance bugs but this might be out of
> the scope, but anyway, that would be nice.
> Another idea would be to actually get the final benchmark run as part of
> CI? So we know how every change impacts performance.
>
>
> Best,
>
> Renato M.
> El mié., 20 mar. 2019 a las 17:15, sneceesa...@gmail.com (<
> sneceesa...@gmail.com>) escribió:
> >
> >
> >
> > On 2017/12/23 20:17:12, Furkan KAMACI  wrote:
> > > Hi Fellows,
> > >
> > > As you know that our project is defined as:
> > >
> > > "*The Apache Gora™ open source framework provides an in-memory data
> model
> > > and persistence for big data.*[1]"
> > >
> > > I believe that Apache Gora is a special project and it touches many
> > > projects. I always wonder the performance of NoSQL DBs as individual
> and
> > > accessed via Apache Gora.
> > >
> > > I think that we should make a benchmark and publish it, and Yahoo!’s
> Cloud
> > > Serving Benchmark (YCSB) [2] is the most suitable tool for such a
> purpose.
> > > I found a recent research about Object-NoSQL Database Mapper (ONDM)
> > > benchmark [3] which includes Apache Gora and they have produced the
> > > benchmark source code as ASF 2.0 licensed [4].
> > >
> > > Here is an example from Apache Accumulo which is based on YCSB too [5].
> > >
> > > What do you think about it? Who wants to join that work apart from me?
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > >
> > > [1] https://gora.apache.org
> > > [2] Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R.
> Benchmarking
> > > cloud serving systems with YCSB. In: Proceedings of the 1st ACM
> symposium
> > > on Cloud computing - SoCC ’10. Association for Computing Machinery
> (ACM):
> > > 2010. p. 143–154, doi:10.1145/1807128.1807152.
> > > http://dx.doi.org/10.1145/1807128.1807152.
> > > [3] https://doi.org/10.1186/s13174-016-0052-x
> > > [4] https://github.com/vreniers/ONDM-Benchmarker
> > > [5] https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > >
> >
> > Hi All, I was advised by Kevin Ratnasekera to start or reignite this
> discussion. I am currently going over the documentation, installation and
> familiarising myself with the code base. Any good pointers here will be
> helpful.
>


Apache Gora Benchmark Module Draft Proposal

2019-03-23 Thread Sheriffo Ceesay
Hi All,

As advised by Kevin Ratnasekera and Furkan Kamaci, please find below the
link to the draft proposal for Gora benchmark module.

Please let me if you have any comments to improve the document. (Don't
worry, I have another copy.)

https://docs.google.com/document/d/1djelY4yVwTuWPA310E_JBinOPnt5PJh3x67z0ZxgBLg/edit?usp=sharing


Please add your comments using the sidebar comment functionality if
possible.

Thank you.


**Sheriffo Ceesay**


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-23 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799790#comment-16799790
 ] 

Sheriffo Ceesay commented on GORA-532:
--

[~kamaci] and [~djkevincr]

I have just sent an email to the dev list regarding my draft proposal. Please 
let me know if you have any comments. 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-20 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797348#comment-16797348
 ] 

Sheriffo Ceesay commented on GORA-532:
--

Thanks [~kamaci], now I see. I will have a look. 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (GORA-532) Benchmarking Module

2019-03-20 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797323#comment-16797323
 ] 

Sheriffo Ceesay edited comment on GORA-532 at 3/20/19 4:26 PM:
---

[~kamaci], I can't find any proposal regarding this topic. I checked all the 
previous GSoC pages but can't find anything relevant to Benchmarking. Would you 
mind clarifying? 


was (Author: sheriffo):
[~kamaci], I can't find any proposal regarding this topic. I checked all the 
previous GSoC pages but can't find any relevant to Benchmarking. Would you mind 
clarifying? 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-20 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797323#comment-16797323
 ] 

Sheriffo Ceesay commented on GORA-532:
--

[~kamaci], I can't find any proposal regarding this topic. I checked all the 
previous GSoC pages but can't find any relevant to Benchmarking. Would you mind 
clarifying? 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-20 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797247#comment-16797247
 ] 

Sheriffo Ceesay commented on GORA-532:
--

[~djkevincr], thanks for the reply. I will have a quick look at your 
suggestions and get back to you as soon as possible. Thank you. 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (GORA-532) Benchmarking Module

2019-03-20 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797104#comment-16797104
 ] 

Sheriffo Ceesay edited comment on GORA-532 at 3/20/19 12:25 PM:


Hi [~kamaci] and [~djkevincr],  I am Sheriffo, 3 year PhD student at the 
University of St Andrews. I am working on Benchmarking and Performance 
Modelling of Big Data Applications. At first sight, I think this is a project 
that I am interested in because it has some similarities with what I am working 
on. You can see my first paper 
[https://ieeexplore.ieee.org/abstract/document/8258249]. 

I will have a look at the papers you have referenced.

Can you please guide me on the application process if it is available for GSoC 
2019?

 

Thank you. 


was (Author: sheriffo):
Hi [~kamaci],  I am Sheriffo, 3 year PhD student at University of St Andrews. I 
am working on Benchmarking and Performance Modelling of Big Data Applications. 
At first sight, I think this is a project that I am interested in because it 
has some similarities with what I am working on. You can see my first paper 
[https://ieeexplore.ieee.org/abstract/document/8258249]. 

I will have a look at the papers you have referenced.

Can you please guide me on the application processs?

 

Thank you. 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GORA-532) Benchmarking Module

2019-03-20 Thread Sheriffo Ceesay (JIRA)


[ 
https://issues.apache.org/jira/browse/GORA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797104#comment-16797104
 ] 

Sheriffo Ceesay commented on GORA-532:
--

Hi [~kamaci],  I am Sheriffo, 3 year PhD student at University of St Andrews. I 
am working on Benchmarking and Performance Modelling of Big Data Applications. 
At first sight, I think this is a project that I am interested in because it 
has some similarities with what I am working on. You can see my first paper 
[https://ieeexplore.ieee.org/abstract/document/8258249]. 

I will have a look at the papers you have referenced.

Can you please guide me on the application processs?

 

Thank you. 

> Benchmarking Module
> ---
>
> Key: GORA-532
> URL: https://issues.apache.org/jira/browse/GORA-532
> Project: Apache Gora
>  Issue Type: New Feature
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Major
>  Labels: gsoc2018, gsoc2019
> Fix For: 0.9
>
>
> We should make a benchmark and publish it, and improved version of the 
> Yahoo!’s Cloud Serving Benchmark (YCSB++) [1] is the most suitable tool for 
> such a purpose. Here is a recent research about Object-NoSQL Database Mapper 
> (ONDM) benchmark [2] which includes Apache Gora and they have produced the 
> benchmark source code as ASF 2.0 licensed [3].
>  
> Also, here is an example from Apache Accumulo which is based on YCSB [4].
>  
> [1] [http://www.cs.cmu.edu/~wtantisi/files/tablebenchmark-pdl11-talk.pdf]
> [2] [https://doi.org/10.1186/s13174-016-0052-x]
> [3] [https://github.com/vreniers/ONDM-Benchmarker]
> [4] [https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)