Re: Add datastore for Elasticsearch. Outreachy Week 8 Report

2021-01-31 Thread Maria Podorvanova
Hi John,

Thank you for your feedback.

I am back from my small holiday and will address your comment this week.

Regards,
Maria

On Thu, 28 Jan 2021 at 03:41, John Mora  wrote:

> Hi María.
>
> Thanks for your report.
>
> Some comments.
>
> When a Key is not found in the Datastore the get method should return
>  null, but currently a NullPointerException is thrown here:
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L256
>
> According to the avro documentation: "This data type is used to declare a
> fixed-sized field that can be used for storing binary data.", it is similar
> to the BYTES data type, but it has a size attribute.
>
> If Elasticsearch has a fixed-sized binary datatype you can use it. But, I
> think that is not the case, so If other modules do not implement it, it is
> fine to leave it that way.
>
> Best,
> John
>
> El dom, 24 ene 2021 a las 8:16, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #8
>> Period: January, 17 - January, 23
>> Activities:
>> - Fix createSchema method [1]
>>
>>1. Added the index mappings while creating the Elasticsearch index
>>2. Added getter and setter to enum Datatype
>>
>> - Implement serialization/deserialization for some Avro data types [2]
>>
>>1. Implemented serializeFieldValue and deserializeFieldValue methods
>>for ARRAY BOOLEAN, BYTES and FIXED Avro data types
>>2. Fixed deserialization for STRING Avro data type
>>3. Added javadoc descriptions
>>
>> - The following tests are passing now:
>>
>>1. testCreateSchema, testAutoCreateSchema for createSchema method
>>2. testSchemaExists
>>3. testPut
>>4. testGet
>>
>> - Wrote a blog post #4
>>
>> Here are links to the commits:
>> [1]
>> https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941
>> [2]
>> https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e
>>
>> Question:
>> I didn't quite understand what kind of type is represented by FIXED Avro
>> data type. I looked through other modules and found that FIXED case is not
>> being processed neither in serialization nor in deserialization, so I did
>> the same for now.
>>
>> Regards,
>> Maria
>>
>


Re: Add datastore for Elasticsearch. Outreachy Week 8 Report

2021-01-27 Thread John Mora
Hi María.

Thanks for your report.

Some comments.

When a Key is not found in the Datastore the get method should return
 null, but currently a NullPointerException is thrown here:

https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L256

According to the avro documentation: "This data type is used to declare a
fixed-sized field that can be used for storing binary data.", it is similar
to the BYTES data type, but it has a size attribute.

If Elasticsearch has a fixed-sized binary datatype you can use it. But, I
think that is not the case, so If other modules do not implement it, it is
fine to leave it that way.

Best,
John

El dom, 24 ene 2021 a las 8:16, Maria Podorvanova (<
podorvanova.ma...@gmail.com>) escribió:

> Hi,
>
> Report #8
> Period: January, 17 - January, 23
> Activities:
> - Fix createSchema method [1]
>
>1. Added the index mappings while creating the Elasticsearch index
>2. Added getter and setter to enum Datatype
>
> - Implement serialization/deserialization for some Avro data types [2]
>
>1. Implemented serializeFieldValue and deserializeFieldValue methods
>for ARRAY BOOLEAN, BYTES and FIXED Avro data types
>2. Fixed deserialization for STRING Avro data type
>3. Added javadoc descriptions
>
> - The following tests are passing now:
>
>1. testCreateSchema, testAutoCreateSchema for createSchema method
>2. testSchemaExists
>3. testPut
>4. testGet
>
> - Wrote a blog post #4
>
> Here are links to the commits:
> [1]
> https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941
> [2]
> https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e
>
> Question:
> I didn't quite understand what kind of type is represented by FIXED Avro
> data type. I looked through other modules and found that FIXED case is not
> being processed neither in serialization nor in deserialization, so I did
> the same for now.
>
> Regards,
> Maria
>


Add datastore for Elasticsearch. Outreachy Week 8 Report

2021-01-24 Thread Maria Podorvanova
Hi,

Report #8
Period: January, 17 - January, 23
Activities:
- Fix createSchema method [1]

   1. Added the index mappings while creating the Elasticsearch index
   2. Added getter and setter to enum Datatype

- Implement serialization/deserialization for some Avro data types [2]

   1. Implemented serializeFieldValue and deserializeFieldValue methods for
   ARRAY BOOLEAN, BYTES and FIXED Avro data types
   2. Fixed deserialization for STRING Avro data type
   3. Added javadoc descriptions

- The following tests are passing now:

   1. testCreateSchema, testAutoCreateSchema for createSchema method
   2. testSchemaExists
   3. testPut
   4. testGet

- Wrote a blog post #4

Here are links to the commits:
[1]
https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941
[2]
https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e

Question:
I didn't quite understand what kind of type is represented by FIXED Avro
data type. I looked through other modules and found that FIXED case is not
being processed neither in serialization nor in deserialization, so I did
the same for now.

Regards,
Maria


Re: Week 8 Report

2019-07-23 Thread Sheriffo Ceesay
Hi Kevin,

Thanks again for the input.

I have taken note of your comments and I will factor them in
moving forward.

Thank you.


**Sheriffo Ceesay**


On Tue, Jul 23, 2019 at 10:13 AM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> Thank you for your findings and hard work on this. I agree on most of the
> points you already mentioned. But I dont think we have consistent client
> implementations across all data stores in Apache Gora. That means some
> clients use async version of their API s as default, some uses sync. Some
> clients have connection pooling implemented and some use single connection
> to do all the data store work. These configurations will generally change
> the behavior when these client s are performed under a huge load. I think
> we will be good, if we do track the client setup/configurations in which
> benchmarks are captured.
>
> Regards
> Kevin
>
> On Tue, Jul 23, 2019 at 2:14 PM Sheriffo Ceesay 
> wrote:
>
> > **Sheriffo Ceesay**
> >
> >
> > On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera <
> djkevincr1...@gmail.com
> > >
> > wrote:
> >
> > > Hi Sheriffo,
> > >
> > > Adding to what Kamaci already mentioned, Have you tried Hbase-Store
> with
> > > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to
> be
> > > batched and asynchronous.
> >
> >
> > Hi Kevin,
> > Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in
> the
> > gora.properties file.
> >
> >
> >
> > > On related note also have a look on points where
> > > you flush() the datastore with native HBase implementation.
> > >
> >
> > The native implementation is through Yahoo! Cloud Service Benchmark. I
> > haven't gone through their implementation. I think it may be due to the
> > default configuration that I am using. I will try to dig further to see
> if
> > I can strike some improvement.
> >
> >
> > >
> > > [1] gora.hbasestore.hbase.client.autoflush.enabled=true
> > >
> > > Regards
> > > Kevin
> > >
> > > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI 
> > > wrote:
> > >
> > > > Hi Sheriffo,
> > > >
> > > > I've checked all the GSoC reports including your's. Thanks for
> filling
> > > > your reports with time slot information about tasks.
> > > >
> > > > You have 3 benchmark result charts at your Week 8 Report.
> Hbase-native
> > is
> > > > dramatically slow compared to the others at 2 out of 3. Do you have
> any
> > > > comment about it?
> > > >
> > > > Kind Regards,
> > > > Furkan KAMACI
> > > >
> > > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay <
> sneceesa...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> Week eight report is available at
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> > > >>
> > > >> I ran some workloads to compare Gora implementation of Mongo and
> HBase
> > > to
> > > >> the native implementations. Plots are provided in the reports and
> the
> > > >> generated data is also available at [1]. This is a work in
> progress, I
> > > >> have
> > > >> done Workload A and Workload B [2].
> > > >>
> > > >> Please let me know if you have any suggestions or questions.
> > > >>
> > > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532
> > > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
> > > >>
> > > >>
> > > >> **Sheriffo Ceesay**
> > > >>
> > > >
> > >
> >
>


Re: Week 8 Report

2019-07-23 Thread Kevin Ratnasekera
On related note: Please also make on note on the client version of the
libraries we are testing for both native and gora implementations.

On Tue, Jul 23, 2019 at 2:43 PM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> Thank you for your findings and hard work on this. I agree on most of the
> points you already mentioned. But I dont think we have consistent client
> implementations across all data stores in Apache Gora. That means some
> clients use async version of their API s as default, some uses sync. Some
> clients have connection pooling implemented and some use single connection
> to do all the data store work. These configurations will generally change
> the behavior when these client s are performed under a huge load. I think
> we will be good, if we do track the client setup/configurations in which
> benchmarks are captured.
>
> Regards
> Kevin
>
> On Tue, Jul 23, 2019 at 2:14 PM Sheriffo Ceesay 
> wrote:
>
>> **Sheriffo Ceesay**
>>
>>
>> On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera <
>> djkevincr1...@gmail.com>
>> wrote:
>>
>> > Hi Sheriffo,
>> >
>> > Adding to what Kamaci already mentioned, Have you tried Hbase-Store with
>> > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to be
>> > batched and asynchronous.
>>
>>
>> Hi Kevin,
>> Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in
>> the
>> gora.properties file.
>>
>>
>>
>> > On related note also have a look on points where
>> > you flush() the datastore with native HBase implementation.
>> >
>>
>> The native implementation is through Yahoo! Cloud Service Benchmark. I
>> haven't gone through their implementation. I think it may be due to the
>> default configuration that I am using. I will try to dig further to see if
>> I can strike some improvement.
>>
>>
>> >
>> > [1] gora.hbasestore.hbase.client.autoflush.enabled=true
>> >
>> > Regards
>> > Kevin
>> >
>> > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI 
>> > wrote:
>> >
>> > > Hi Sheriffo,
>> > >
>> > > I've checked all the GSoC reports including your's. Thanks for filling
>> > > your reports with time slot information about tasks.
>> > >
>> > > You have 3 benchmark result charts at your Week 8 Report.
>> Hbase-native is
>> > > dramatically slow compared to the others at 2 out of 3. Do you have
>> any
>> > > comment about it?
>> > >
>> > > Kind Regards,
>> > > Furkan KAMACI
>> > >
>> > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay <
>> sneceesa...@gmail.com>
>> > > wrote:
>> > >
>> > >> Week eight report is available at
>> > >>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>> > >>
>> > >> I ran some workloads to compare Gora implementation of Mongo and
>> HBase
>> > to
>> > >> the native implementations. Plots are provided in the reports and the
>> > >> generated data is also available at [1]. This is a work in progress,
>> I
>> > >> have
>> > >> done Workload A and Workload B [2].
>> > >>
>> > >> Please let me know if you have any suggestions or questions.
>> > >>
>> > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532
>> > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
>> > >>
>> > >>
>> > >> **Sheriffo Ceesay**
>> > >>
>> > >
>> >
>>
>


Re: Week 8 Report

2019-07-23 Thread Kevin Ratnasekera
Hi Sheriffo,

Thank you for your findings and hard work on this. I agree on most of the
points you already mentioned. But I dont think we have consistent client
implementations across all data stores in Apache Gora. That means some
clients use async version of their API s as default, some uses sync. Some
clients have connection pooling implemented and some use single connection
to do all the data store work. These configurations will generally change
the behavior when these client s are performed under a huge load. I think
we will be good, if we do track the client setup/configurations in which
benchmarks are captured.

Regards
Kevin

On Tue, Jul 23, 2019 at 2:14 PM Sheriffo Ceesay 
wrote:

> **Sheriffo Ceesay**
>
>
> On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera  >
> wrote:
>
> > Hi Sheriffo,
> >
> > Adding to what Kamaci already mentioned, Have you tried Hbase-Store with
> > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to be
> > batched and asynchronous.
>
>
> Hi Kevin,
> Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in the
> gora.properties file.
>
>
>
> > On related note also have a look on points where
> > you flush() the datastore with native HBase implementation.
> >
>
> The native implementation is through Yahoo! Cloud Service Benchmark. I
> haven't gone through their implementation. I think it may be due to the
> default configuration that I am using. I will try to dig further to see if
> I can strike some improvement.
>
>
> >
> > [1] gora.hbasestore.hbase.client.autoflush.enabled=true
> >
> > Regards
> > Kevin
> >
> > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI 
> > wrote:
> >
> > > Hi Sheriffo,
> > >
> > > I've checked all the GSoC reports including your's. Thanks for filling
> > > your reports with time slot information about tasks.
> > >
> > > You have 3 benchmark result charts at your Week 8 Report. Hbase-native
> is
> > > dramatically slow compared to the others at 2 out of 3. Do you have any
> > > comment about it?
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> > >
> > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay  >
> > > wrote:
> > >
> > >> Week eight report is available at
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> > >>
> > >> I ran some workloads to compare Gora implementation of Mongo and HBase
> > to
> > >> the native implementations. Plots are provided in the reports and the
> > >> generated data is also available at [1]. This is a work in progress, I
> > >> have
> > >> done Workload A and Workload B [2].
> > >>
> > >> Please let me know if you have any suggestions or questions.
> > >>
> > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532
> > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
> > >>
> > >>
> > >> **Sheriffo Ceesay**
> > >>
> > >
> >
>


Re: Week 8 Report

2019-07-23 Thread Sheriffo Ceesay
**Sheriffo Ceesay**


On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera 
wrote:

> Hi Sheriffo,
>
> Adding to what Kamaci already mentioned, Have you tried Hbase-Store with
> buffered mutator engaged? [1] It allows HBase operations Eg:- puts to be
> batched and asynchronous.


Hi Kevin,
Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in the
gora.properties file.



> On related note also have a look on points where
> you flush() the datastore with native HBase implementation.
>

The native implementation is through Yahoo! Cloud Service Benchmark. I
haven't gone through their implementation. I think it may be due to the
default configuration that I am using. I will try to dig further to see if
I can strike some improvement.


>
> [1] gora.hbasestore.hbase.client.autoflush.enabled=true
>
> Regards
> Kevin
>
> On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI 
> wrote:
>
> > Hi Sheriffo,
> >
> > I've checked all the GSoC reports including your's. Thanks for filling
> > your reports with time slot information about tasks.
> >
> > You have 3 benchmark result charts at your Week 8 Report. Hbase-native is
> > dramatically slow compared to the others at 2 out of 3. Do you have any
> > comment about it?
> >
> > Kind Regards,
> > Furkan KAMACI
> >
> > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay 
> > wrote:
> >
> >> Week eight report is available at
> >>
> >>
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >>
> >> I ran some workloads to compare Gora implementation of Mongo and HBase
> to
> >> the native implementations. Plots are provided in the reports and the
> >> generated data is also available at [1]. This is a work in progress, I
> >> have
> >> done Workload A and Workload B [2].
> >>
> >> Please let me know if you have any suggestions or questions.
> >>
> >> [1] https://github.com/sneceesay77/gora/tree/GORA-532
> >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
> >>
> >>
> >> **Sheriffo Ceesay**
> >>
> >
>


Re: Week 8 Report

2019-07-23 Thread Sheriffo Ceesay
**Sheriffo Ceesay**
Hi Furkan,

Thanks for the reply. I have replied to your comment inline.

On Tue, Jul 23, 2019 at 1:10 AM Furkan KAMACI 
wrote:

> Hi Sheriffo,
>
> I've checked all the GSoC reports including your's. Thanks for filling your
> reports with time slot information about tasks.
>
> You have 3 benchmark result charts at your Week 8 Report. Hbase-native is
> dramatically slow compared to the others at 2 out of 3. Do you have any
> comment about it?


Thanks for the feedback. Right now I don't have a concrete explanation but
I suspect HBase setup. I am using the default setup for all datastores
without setting any optimization settings in the configuration files. I
will have a look at this week. My only concern is I don't want to over
optimise the configuration of a particular datastore thereby giving an
unfair advantage over the others.

Thank you.

>
> Kind Regards,
> Furkan KAMACI
>
> On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay 
> wrote:
>
> > Week eight report is available at
> >
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >
> > I ran some workloads to compare Gora implementation of Mongo and HBase to
> > the native implementations. Plots are provided in the reports and the
> > generated data is also available at [1]. This is a work in progress, I
> have
> > done Workload A and Workload B [2].
> >
> > Please let me know if you have any suggestions or questions.
> >
> > [1] https://github.com/sneceesay77/gora/tree/GORA-532
> > [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
> >
> >
> > **Sheriffo Ceesay**
> >
>


Re: Week 8 Report

2019-07-22 Thread Furkan KAMACI
Hi Sheriffo,

I've checked all the GSoC reports including your's. Thanks for filling your
reports with time slot information about tasks.

You have 3 benchmark result charts at your Week 8 Report. Hbase-native is
dramatically slow compared to the others at 2 out of 3. Do you have any
comment about it?

Kind Regards,
Furkan KAMACI

On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay 
wrote:

> Week eight report is available at
>
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>
> I ran some workloads to compare Gora implementation of Mongo and HBase to
> the native implementations. Plots are provided in the reports and the
> generated data is also available at [1]. This is a work in progress, I have
> done Workload A and Workload B [2].
>
> Please let me know if you have any suggestions or questions.
>
> [1] https://github.com/sneceesay77/gora/tree/GORA-532
> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads
>
>
> **Sheriffo Ceesay**
>


Week 8 Report

2019-07-21 Thread Sheriffo Ceesay
Week eight report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

I ran some workloads to compare Gora implementation of Mongo and HBase to
the native implementations. Plots are provided in the reports and the
generated data is also available at [1]. This is a work in progress, I have
done Workload A and Workload B [2].

Please let me know if you have any suggestions or questions.

[1] https://github.com/sneceesay77/gora/tree/GORA-532
[2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads


**Sheriffo Ceesay**