Re: Week 1 Report and Some Questions

2019-06-11 Thread lewis john mcgibbney
Excellent.
Apologies for being absent. I am undergoing a job transition and it has
been very busy.
I suggest that we start a weekly tagup as well.
Lewis

On Sun, Jun 2, 2019 at 1:14 PM Sheriffo Ceesay 
wrote:

> The code so far is available at the GitHub link below.
>
> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark
>
>
>
> **Sheriffo Ceesay**
>
>
> On Sun, Jun 2, 2019 at 8:34 PM Sheriffo Ceesay 
> wrote:
>
>> Hi Renato,
>>
>> Thanks for the detailed reply. I agree with your recommendations on the
>> way forward. I will go ahead and implement the rest of the functionality
>> using reflection and we can follow your recommendations on the next
>> iterations.
>>
>> As for the backend, I am using both HBase and MongoDB and all seems well
>> at the moment.
>>
>> I will let you all know why I push my code to GitHub.
>>
>> Thank you.
>>
>>
>> **Sheriffo Ceesay**
>>
>>
>> On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo <
>> renatoj.marroq...@gmail.com> wrote:
>>
>>> Hi Sheriffo,
>>>
>>> Some opinions about your questions, but others are more than welcome
>>> to suggest other things as well.
>>>
>>> Q1: Are we going to consider arbitrary field length, e.g. if we set
>>> the fieldcount to 100 then we have to create the respective Avro and
>>> mapping files? Currently,
>>> I don't think this process is automated and may be tedious for large
>>> field counts.
>>> I think for the first code iteration, we should use whatever
>>> fieldcount you have generated for. Ideally, we should be able to
>>> invoke the Gora bean generator and generate as many fields as required
>>> by the benchmark configuration.
>>>
>>> Q2: Second: The second problem has to do with the first one, if we
>>> allow arbitrary field counts, then there has to be a mechanism to call
>>> each of the set or get methods during CRUD operations. So to avoid
>>> this I used Java Reflection. See the sample code below.
>>> We have some options to deal with having arbitrarily number of fields.
>>> 1) Use reflection as you have which might be ok for the first code
>>> iteration, but if we want to have some decent performance against
>>> using datastores natively (no Gora), we should go away from it.
>>> 2) Do Gora class generation (and also generate the method used to
>>> insert data through Gora) in a step before the benchmark starts.
>>> Something like this:
>>> # passing config parameters to generate Gora Beans with number of
>>> required fields
>>> # this should output the generate class and the method that does the
>>> insertion
>>> $ gora_compiler.sh --benchmark --fields_required 4
>>> The output path containing the result of this should be then include
>>> (or passed) as runtime dependency to the benchmark class.
>>> 3) Because Gora uses Avro, we can use complex data types, e.g.,
>>> arrays, maps. So we could represent number of fields as number of
>>> elements inside an array. I would think that this option gives us the
>>> best performance.
>>> I think  we should continue with option (1) until we have the entire
>>> pipeline working, and we understand how every piece fits together with
>>> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
>>> should do (2) which is the most general and the one that reflects how
>>> people usually use Gora, and then we test with (3). I think all of
>>> these steps are totally doable in our time frame as we build upon
>>> previous steps.
>>> The other thing that we should decide is which backend to use as there
>>> are backends that are more mature than others. I'd say to use the
>>> HBase backend as it is the most stable one and the one with more
>>> features, and if we feel brave we can try other backends (and fix them
>>> if necessary!)
>>>
>>>
>>> Best,
>>>
>>> Renato M>
>>>
>>> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
>>> () escribió:
>>> >
>>> > Dear Mentors,
>>> >
>>> > My week one report is available at
>>> >
>>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>>> >
>>> > I have also included a detailed question of and I will need your
>>> guidance
>>> > on that.
>>> >
>>> > Please let me know what your thoughts are.
>>> >
>>> > Thank you.
>>> >
>>> > **Sheriffo Ceesay**
>>>
>>

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: Week 1 Report and Some Questions

2019-06-02 Thread Sheriffo Ceesay
The code so far is available at the GitHub link below.

https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark



**Sheriffo Ceesay**


On Sun, Jun 2, 2019 at 8:34 PM Sheriffo Ceesay 
wrote:

> Hi Renato,
>
> Thanks for the detailed reply. I agree with your recommendations on the
> way forward. I will go ahead and implement the rest of the functionality
> using reflection and we can follow your recommendations on the next
> iterations.
>
> As for the backend, I am using both HBase and MongoDB and all seems well
> at the moment.
>
> I will let you all know why I push my code to GitHub.
>
> Thank you.
>
>
> **Sheriffo Ceesay**
>
>
> On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hi Sheriffo,
>>
>> Some opinions about your questions, but others are more than welcome
>> to suggest other things as well.
>>
>> Q1: Are we going to consider arbitrary field length, e.g. if we set
>> the fieldcount to 100 then we have to create the respective Avro and
>> mapping files? Currently,
>> I don't think this process is automated and may be tedious for large
>> field counts.
>> I think for the first code iteration, we should use whatever
>> fieldcount you have generated for. Ideally, we should be able to
>> invoke the Gora bean generator and generate as many fields as required
>> by the benchmark configuration.
>>
>> Q2: Second: The second problem has to do with the first one, if we
>> allow arbitrary field counts, then there has to be a mechanism to call
>> each of the set or get methods during CRUD operations. So to avoid
>> this I used Java Reflection. See the sample code below.
>> We have some options to deal with having arbitrarily number of fields.
>> 1) Use reflection as you have which might be ok for the first code
>> iteration, but if we want to have some decent performance against
>> using datastores natively (no Gora), we should go away from it.
>> 2) Do Gora class generation (and also generate the method used to
>> insert data through Gora) in a step before the benchmark starts.
>> Something like this:
>> # passing config parameters to generate Gora Beans with number of
>> required fields
>> # this should output the generate class and the method that does the
>> insertion
>> $ gora_compiler.sh --benchmark --fields_required 4
>> The output path containing the result of this should be then include
>> (or passed) as runtime dependency to the benchmark class.
>> 3) Because Gora uses Avro, we can use complex data types, e.g.,
>> arrays, maps. So we could represent number of fields as number of
>> elements inside an array. I would think that this option gives us the
>> best performance.
>> I think  we should continue with option (1) until we have the entire
>> pipeline working, and we understand how every piece fits together with
>> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
>> should do (2) which is the most general and the one that reflects how
>> people usually use Gora, and then we test with (3). I think all of
>> these steps are totally doable in our time frame as we build upon
>> previous steps.
>> The other thing that we should decide is which backend to use as there
>> are backends that are more mature than others. I'd say to use the
>> HBase backend as it is the most stable one and the one with more
>> features, and if we feel brave we can try other backends (and fix them
>> if necessary!)
>>
>>
>> Best,
>>
>> Renato M>
>>
>> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
>> () escribió:
>> >
>> > Dear Mentors,
>> >
>> > My week one report is available at
>> >
>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>> >
>> > I have also included a detailed question of and I will need your
>> guidance
>> > on that.
>> >
>> > Please let me know what your thoughts are.
>> >
>> > Thank you.
>> >
>> > **Sheriffo Ceesay**
>>
>


Re: Week 1 Report and Some Questions

2019-06-02 Thread Sheriffo Ceesay
Hi Renato,

Thanks for the detailed reply. I agree with your recommendations on the way
forward. I will go ahead and implement the rest of the functionality using
reflection and we can follow your recommendations on the next iterations.

As for the backend, I am using both HBase and MongoDB and all seems well at
the moment.

I will let you all know why I push my code to GitHub.

Thank you.


**Sheriffo Ceesay**


On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Sheriffo,
>
> Some opinions about your questions, but others are more than welcome
> to suggest other things as well.
>
> Q1: Are we going to consider arbitrary field length, e.g. if we set
> the fieldcount to 100 then we have to create the respective Avro and
> mapping files? Currently,
> I don't think this process is automated and may be tedious for large
> field counts.
> I think for the first code iteration, we should use whatever
> fieldcount you have generated for. Ideally, we should be able to
> invoke the Gora bean generator and generate as many fields as required
> by the benchmark configuration.
>
> Q2: Second: The second problem has to do with the first one, if we
> allow arbitrary field counts, then there has to be a mechanism to call
> each of the set or get methods during CRUD operations. So to avoid
> this I used Java Reflection. See the sample code below.
> We have some options to deal with having arbitrarily number of fields.
> 1) Use reflection as you have which might be ok for the first code
> iteration, but if we want to have some decent performance against
> using datastores natively (no Gora), we should go away from it.
> 2) Do Gora class generation (and also generate the method used to
> insert data through Gora) in a step before the benchmark starts.
> Something like this:
> # passing config parameters to generate Gora Beans with number of
> required fields
> # this should output the generate class and the method that does the
> insertion
> $ gora_compiler.sh --benchmark --fields_required 4
> The output path containing the result of this should be then include
> (or passed) as runtime dependency to the benchmark class.
> 3) Because Gora uses Avro, we can use complex data types, e.g.,
> arrays, maps. So we could represent number of fields as number of
> elements inside an array. I would think that this option gives us the
> best performance.
> I think  we should continue with option (1) until we have the entire
> pipeline working, and we understand how every piece fits together with
> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
> should do (2) which is the most general and the one that reflects how
> people usually use Gora, and then we test with (3). I think all of
> these steps are totally doable in our time frame as we build upon
> previous steps.
> The other thing that we should decide is which backend to use as there
> are backends that are more mature than others. I'd say to use the
> HBase backend as it is the most stable one and the one with more
> features, and if we feel brave we can try other backends (and fix them
> if necessary!)
>
>
> Best,
>
> Renato M>
>
> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
> () escribió:
> >
> > Dear Mentors,
> >
> > My week one report is available at
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >
> > I have also included a detailed question of and I will need your guidance
> > on that.
> >
> > Please let me know what your thoughts are.
> >
> > Thank you.
> >
> > **Sheriffo Ceesay**
>


Re: Week 1 Report and Some Questions

2019-06-02 Thread Renato Marroquín Mogrovejo
Hi Sheriffo,

Some opinions about your questions, but others are more than welcome
to suggest other things as well.

Q1: Are we going to consider arbitrary field length, e.g. if we set
the fieldcount to 100 then we have to create the respective Avro and
mapping files? Currently,
I don't think this process is automated and may be tedious for large
field counts.
I think for the first code iteration, we should use whatever
fieldcount you have generated for. Ideally, we should be able to
invoke the Gora bean generator and generate as many fields as required
by the benchmark configuration.

Q2: Second: The second problem has to do with the first one, if we
allow arbitrary field counts, then there has to be a mechanism to call
each of the set or get methods during CRUD operations. So to avoid
this I used Java Reflection. See the sample code below.
We have some options to deal with having arbitrarily number of fields.
1) Use reflection as you have which might be ok for the first code
iteration, but if we want to have some decent performance against
using datastores natively (no Gora), we should go away from it.
2) Do Gora class generation (and also generate the method used to
insert data through Gora) in a step before the benchmark starts.
Something like this:
# passing config parameters to generate Gora Beans with number of
required fields
# this should output the generate class and the method that does the insertion
$ gora_compiler.sh --benchmark --fields_required 4
The output path containing the result of this should be then include
(or passed) as runtime dependency to the benchmark class.
3) Because Gora uses Avro, we can use complex data types, e.g.,
arrays, maps. So we could represent number of fields as number of
elements inside an array. I would think that this option gives us the
best performance.
I think  we should continue with option (1) until we have the entire
pipeline working, and we understand how every piece fits together with
each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
should do (2) which is the most general and the one that reflects how
people usually use Gora, and then we test with (3). I think all of
these steps are totally doable in our time frame as we build upon
previous steps.
The other thing that we should decide is which backend to use as there
are backends that are more mature than others. I'd say to use the
HBase backend as it is the most stable one and the one with more
features, and if we feel brave we can try other backends (and fix them
if necessary!)


Best,

Renato M>

El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
() escribió:
>
> Dear Mentors,
>
> My week one report is available at
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>
> I have also included a detailed question of and I will need your guidance
> on that.
>
> Please let me know what your thoughts are.
>
> Thank you.
>
> **Sheriffo Ceesay**