Re: Add datastore for Elasticsearch. Outreachy Week 8 Report
Hi John, Thank you for your feedback. I am back from my small holiday and will address your comment this week. Regards, Maria On Thu, 28 Jan 2021 at 03:41, John Mora wrote: > Hi María. > > Thanks for your report. > > Some comments. > > When a Key is not found in the Datastore the get method should return > null, but currently a NullPointerException is thrown here: > > > https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L256 > > According to the avro documentation: "This data type is used to declare a > fixed-sized field that can be used for storing binary data.", it is similar > to the BYTES data type, but it has a size attribute. > > If Elasticsearch has a fixed-sized binary datatype you can use it. But, I > think that is not the case, so If other modules do not implement it, it is > fine to leave it that way. > > Best, > John > > El dom, 24 ene 2021 a las 8:16, Maria Podorvanova (< > podorvanova.ma...@gmail.com>) escribió: > >> Hi, >> >> Report #8 >> Period: January, 17 - January, 23 >> Activities: >> - Fix createSchema method [1] >> >>1. Added the index mappings while creating the Elasticsearch index >>2. Added getter and setter to enum Datatype >> >> - Implement serialization/deserialization for some Avro data types [2] >> >>1. Implemented serializeFieldValue and deserializeFieldValue methods >>for ARRAY BOOLEAN, BYTES and FIXED Avro data types >>2. Fixed deserialization for STRING Avro data type >>3. Added javadoc descriptions >> >> - The following tests are passing now: >> >>1. testCreateSchema, testAutoCreateSchema for createSchema method >>2. testSchemaExists >>3. testPut >>4. testGet >> >> - Wrote a blog post #4 >> >> Here are links to the commits: >> [1] >> https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941 >> [2] >> https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e >> >> Question: >> I didn't quite understand what kind of type is represented by FIXED Avro >> data type. I looked through other modules and found that FIXED case is not >> being processed neither in serialization nor in deserialization, so I did >> the same for now. >> >> Regards, >> Maria >> >
Re: Add datastore for Elasticsearch. Outreachy Week 8 Report
Hi María. Thanks for your report. Some comments. When a Key is not found in the Datastore the get method should return null, but currently a NullPointerException is thrown here: https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L256 According to the avro documentation: "This data type is used to declare a fixed-sized field that can be used for storing binary data.", it is similar to the BYTES data type, but it has a size attribute. If Elasticsearch has a fixed-sized binary datatype you can use it. But, I think that is not the case, so If other modules do not implement it, it is fine to leave it that way. Best, John El dom, 24 ene 2021 a las 8:16, Maria Podorvanova (< podorvanova.ma...@gmail.com>) escribió: > Hi, > > Report #8 > Period: January, 17 - January, 23 > Activities: > - Fix createSchema method [1] > >1. Added the index mappings while creating the Elasticsearch index >2. Added getter and setter to enum Datatype > > - Implement serialization/deserialization for some Avro data types [2] > >1. Implemented serializeFieldValue and deserializeFieldValue methods >for ARRAY BOOLEAN, BYTES and FIXED Avro data types >2. Fixed deserialization for STRING Avro data type >3. Added javadoc descriptions > > - The following tests are passing now: > >1. testCreateSchema, testAutoCreateSchema for createSchema method >2. testSchemaExists >3. testPut >4. testGet > > - Wrote a blog post #4 > > Here are links to the commits: > [1] > https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941 > [2] > https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e > > Question: > I didn't quite understand what kind of type is represented by FIXED Avro > data type. I looked through other modules and found that FIXED case is not > being processed neither in serialization nor in deserialization, so I did > the same for now. > > Regards, > Maria >
Add datastore for Elasticsearch. Outreachy Week 8 Report
Hi, Report #8 Period: January, 17 - January, 23 Activities: - Fix createSchema method [1] 1. Added the index mappings while creating the Elasticsearch index 2. Added getter and setter to enum Datatype - Implement serialization/deserialization for some Avro data types [2] 1. Implemented serializeFieldValue and deserializeFieldValue methods for ARRAY BOOLEAN, BYTES and FIXED Avro data types 2. Fixed deserialization for STRING Avro data type 3. Added javadoc descriptions - The following tests are passing now: 1. testCreateSchema, testAutoCreateSchema for createSchema method 2. testSchemaExists 3. testPut 4. testGet - Wrote a blog post #4 Here are links to the commits: [1] https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941 [2] https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e Question: I didn't quite understand what kind of type is represented by FIXED Avro data type. I looked through other modules and found that FIXED case is not being processed neither in serialization nor in deserialization, so I did the same for now. Regards, Maria
Re: Week 8 Report
Hi Kevin, Thanks again for the input. I have taken note of your comments and I will factor them in moving forward. Thank you. **Sheriffo Ceesay** On Tue, Jul 23, 2019 at 10:13 AM Kevin Ratnasekera wrote: > Hi Sheriffo, > > Thank you for your findings and hard work on this. I agree on most of the > points you already mentioned. But I dont think we have consistent client > implementations across all data stores in Apache Gora. That means some > clients use async version of their API s as default, some uses sync. Some > clients have connection pooling implemented and some use single connection > to do all the data store work. These configurations will generally change > the behavior when these client s are performed under a huge load. I think > we will be good, if we do track the client setup/configurations in which > benchmarks are captured. > > Regards > Kevin > > On Tue, Jul 23, 2019 at 2:14 PM Sheriffo Ceesay > wrote: > > > **Sheriffo Ceesay** > > > > > > On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera < > djkevincr1...@gmail.com > > > > > wrote: > > > > > Hi Sheriffo, > > > > > > Adding to what Kamaci already mentioned, Have you tried Hbase-Store > with > > > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to > be > > > batched and asynchronous. > > > > > > Hi Kevin, > > Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in > the > > gora.properties file. > > > > > > > > > On related note also have a look on points where > > > you flush() the datastore with native HBase implementation. > > > > > > > The native implementation is through Yahoo! Cloud Service Benchmark. I > > haven't gone through their implementation. I think it may be due to the > > default configuration that I am using. I will try to dig further to see > if > > I can strike some improvement. > > > > > > > > > > [1] gora.hbasestore.hbase.client.autoflush.enabled=true > > > > > > Regards > > > Kevin > > > > > > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI > > > wrote: > > > > > > > Hi Sheriffo, > > > > > > > > I've checked all the GSoC reports including your's. Thanks for > filling > > > > your reports with time slot information about tasks. > > > > > > > > You have 3 benchmark result charts at your Week 8 Report. > Hbase-native > > is > > > > dramatically slow compared to the others at 2 out of 3. Do you have > any > > > > comment about it? > > > > > > > > Kind Regards, > > > > Furkan KAMACI > > > > > > > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay < > sneceesa...@gmail.com > > > > > > > wrote: > > > > > > > >> Week eight report is available at > > > >> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > > >> > > > >> I ran some workloads to compare Gora implementation of Mongo and > HBase > > > to > > > >> the native implementations. Plots are provided in the reports and > the > > > >> generated data is also available at [1]. This is a work in > progress, I > > > >> have > > > >> done Workload A and Workload B [2]. > > > >> > > > >> Please let me know if you have any suggestions or questions. > > > >> > > > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532 > > > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads > > > >> > > > >> > > > >> **Sheriffo Ceesay** > > > >> > > > > > > > > > >
Re: Week 8 Report
On related note: Please also make on note on the client version of the libraries we are testing for both native and gora implementations. On Tue, Jul 23, 2019 at 2:43 PM Kevin Ratnasekera wrote: > Hi Sheriffo, > > Thank you for your findings and hard work on this. I agree on most of the > points you already mentioned. But I dont think we have consistent client > implementations across all data stores in Apache Gora. That means some > clients use async version of their API s as default, some uses sync. Some > clients have connection pooling implemented and some use single connection > to do all the data store work. These configurations will generally change > the behavior when these client s are performed under a huge load. I think > we will be good, if we do track the client setup/configurations in which > benchmarks are captured. > > Regards > Kevin > > On Tue, Jul 23, 2019 at 2:14 PM Sheriffo Ceesay > wrote: > >> **Sheriffo Ceesay** >> >> >> On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera < >> djkevincr1...@gmail.com> >> wrote: >> >> > Hi Sheriffo, >> > >> > Adding to what Kamaci already mentioned, Have you tried Hbase-Store with >> > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to be >> > batched and asynchronous. >> >> >> Hi Kevin, >> Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in >> the >> gora.properties file. >> >> >> >> > On related note also have a look on points where >> > you flush() the datastore with native HBase implementation. >> > >> >> The native implementation is through Yahoo! Cloud Service Benchmark. I >> haven't gone through their implementation. I think it may be due to the >> default configuration that I am using. I will try to dig further to see if >> I can strike some improvement. >> >> >> > >> > [1] gora.hbasestore.hbase.client.autoflush.enabled=true >> > >> > Regards >> > Kevin >> > >> > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI >> > wrote: >> > >> > > Hi Sheriffo, >> > > >> > > I've checked all the GSoC reports including your's. Thanks for filling >> > > your reports with time slot information about tasks. >> > > >> > > You have 3 benchmark result charts at your Week 8 Report. >> Hbase-native is >> > > dramatically slow compared to the others at 2 out of 3. Do you have >> any >> > > comment about it? >> > > >> > > Kind Regards, >> > > Furkan KAMACI >> > > >> > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay < >> sneceesa...@gmail.com> >> > > wrote: >> > > >> > >> Week eight report is available at >> > >> >> > >> >> > >> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report >> > >> >> > >> I ran some workloads to compare Gora implementation of Mongo and >> HBase >> > to >> > >> the native implementations. Plots are provided in the reports and the >> > >> generated data is also available at [1]. This is a work in progress, >> I >> > >> have >> > >> done Workload A and Workload B [2]. >> > >> >> > >> Please let me know if you have any suggestions or questions. >> > >> >> > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532 >> > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads >> > >> >> > >> >> > >> **Sheriffo Ceesay** >> > >> >> > > >> > >> >
Re: Week 8 Report
Hi Sheriffo, Thank you for your findings and hard work on this. I agree on most of the points you already mentioned. But I dont think we have consistent client implementations across all data stores in Apache Gora. That means some clients use async version of their API s as default, some uses sync. Some clients have connection pooling implemented and some use single connection to do all the data store work. These configurations will generally change the behavior when these client s are performed under a huge load. I think we will be good, if we do track the client setup/configurations in which benchmarks are captured. Regards Kevin On Tue, Jul 23, 2019 at 2:14 PM Sheriffo Ceesay wrote: > **Sheriffo Ceesay** > > > On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera > > wrote: > > > Hi Sheriffo, > > > > Adding to what Kamaci already mentioned, Have you tried Hbase-Store with > > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to be > > batched and asynchronous. > > > Hi Kevin, > Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in the > gora.properties file. > > > > > On related note also have a look on points where > > you flush() the datastore with native HBase implementation. > > > > The native implementation is through Yahoo! Cloud Service Benchmark. I > haven't gone through their implementation. I think it may be due to the > default configuration that I am using. I will try to dig further to see if > I can strike some improvement. > > > > > > [1] gora.hbasestore.hbase.client.autoflush.enabled=true > > > > Regards > > Kevin > > > > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI > > wrote: > > > > > Hi Sheriffo, > > > > > > I've checked all the GSoC reports including your's. Thanks for filling > > > your reports with time slot information about tasks. > > > > > > You have 3 benchmark result charts at your Week 8 Report. Hbase-native > is > > > dramatically slow compared to the others at 2 out of 3. Do you have any > > > comment about it? > > > > > > Kind Regards, > > > Furkan KAMACI > > > > > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay > > > > wrote: > > > > > >> Week eight report is available at > > >> > > >> > > > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > >> > > >> I ran some workloads to compare Gora implementation of Mongo and HBase > > to > > >> the native implementations. Plots are provided in the reports and the > > >> generated data is also available at [1]. This is a work in progress, I > > >> have > > >> done Workload A and Workload B [2]. > > >> > > >> Please let me know if you have any suggestions or questions. > > >> > > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532 > > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads > > >> > > >> > > >> **Sheriffo Ceesay** > > >> > > > > > >
Re: Week 8 Report
**Sheriffo Ceesay** On Tue, Jul 23, 2019 at 6:52 AM Kevin Ratnasekera wrote: > Hi Sheriffo, > > Adding to what Kamaci already mentioned, Have you tried Hbase-Store with > buffered mutator engaged? [1] It allows HBase operations Eg:- puts to be > batched and asynchronous. Hi Kevin, Yes I am using *gora.hbasestore.hbase.client.autoflush.enabled=true *in the gora.properties file. > On related note also have a look on points where > you flush() the datastore with native HBase implementation. > The native implementation is through Yahoo! Cloud Service Benchmark. I haven't gone through their implementation. I think it may be due to the default configuration that I am using. I will try to dig further to see if I can strike some improvement. > > [1] gora.hbasestore.hbase.client.autoflush.enabled=true > > Regards > Kevin > > On Tue, Jul 23, 2019 at 5:40 AM Furkan KAMACI > wrote: > > > Hi Sheriffo, > > > > I've checked all the GSoC reports including your's. Thanks for filling > > your reports with time slot information about tasks. > > > > You have 3 benchmark result charts at your Week 8 Report. Hbase-native is > > dramatically slow compared to the others at 2 out of 3. Do you have any > > comment about it? > > > > Kind Regards, > > Furkan KAMACI > > > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay > > wrote: > > > >> Week eight report is available at > >> > >> > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > >> > >> I ran some workloads to compare Gora implementation of Mongo and HBase > to > >> the native implementations. Plots are provided in the reports and the > >> generated data is also available at [1]. This is a work in progress, I > >> have > >> done Workload A and Workload B [2]. > >> > >> Please let me know if you have any suggestions or questions. > >> > >> [1] https://github.com/sneceesay77/gora/tree/GORA-532 > >> [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads > >> > >> > >> **Sheriffo Ceesay** > >> > > >
Re: Week 8 Report
**Sheriffo Ceesay** Hi Furkan, Thanks for the reply. I have replied to your comment inline. On Tue, Jul 23, 2019 at 1:10 AM Furkan KAMACI wrote: > Hi Sheriffo, > > I've checked all the GSoC reports including your's. Thanks for filling your > reports with time slot information about tasks. > > You have 3 benchmark result charts at your Week 8 Report. Hbase-native is > dramatically slow compared to the others at 2 out of 3. Do you have any > comment about it? Thanks for the feedback. Right now I don't have a concrete explanation but I suspect HBase setup. I am using the default setup for all datastores without setting any optimization settings in the configuration files. I will have a look at this week. My only concern is I don't want to over optimise the configuration of a particular datastore thereby giving an unfair advantage over the others. Thank you. > > Kind Regards, > Furkan KAMACI > > On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay > wrote: > > > Week eight report is available at > > > > > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > > > I ran some workloads to compare Gora implementation of Mongo and HBase to > > the native implementations. Plots are provided in the reports and the > > generated data is also available at [1]. This is a work in progress, I > have > > done Workload A and Workload B [2]. > > > > Please let me know if you have any suggestions or questions. > > > > [1] https://github.com/sneceesay77/gora/tree/GORA-532 > > [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads > > > > > > **Sheriffo Ceesay** > > >
Re: Week 8 Report
Hi Sheriffo, I've checked all the GSoC reports including your's. Thanks for filling your reports with time slot information about tasks. You have 3 benchmark result charts at your Week 8 Report. Hbase-native is dramatically slow compared to the others at 2 out of 3. Do you have any comment about it? Kind Regards, Furkan KAMACI On Sun, Jul 21, 2019 at 7:33 PM Sheriffo Ceesay wrote: > Week eight report is available at > > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > I ran some workloads to compare Gora implementation of Mongo and HBase to > the native implementations. Plots are provided in the reports and the > generated data is also available at [1]. This is a work in progress, I have > done Workload A and Workload B [2]. > > Please let me know if you have any suggestions or questions. > > [1] https://github.com/sneceesay77/gora/tree/GORA-532 > [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads > > > **Sheriffo Ceesay** >
Week 8 Report
Week eight report is available at https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report I ran some workloads to compare Gora implementation of Mongo and HBase to the native implementations. Plots are provided in the reports and the generated data is also available at [1]. This is a work in progress, I have done Workload A and Workload B [2]. Please let me know if you have any suggestions or questions. [1] https://github.com/sneceesay77/gora/tree/GORA-532 [2] https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads **Sheriffo Ceesay**