Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi, Okay, I will do that then. Thanks. Regards, Maria On Thu, 21 Jan 2021 at 03:33, John Mora wrote: > Hi Maria, > > Sorry for the late reply. Let's keep it simple.You can throw an exception > when you receive a STRING and only process RECORD cases in UNION. > > Example: > > https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349 > > Regards, > John > > El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (< > podorvanova.ma...@gmail.com>) escribió: > >> Hi >> >> Thank you for your comments. >> >> I will take a look into your links, but my question was a bit different. >> The problem is that foreign key "boss" is represented in Avro as UNION of >> three types: STRING, NULL and RECORD. Your answer is in regards to how to >> handle the last case (RECORD), but I was asking about how to handle >> the STRING case. AFAIU STRING refers to the Employee's primary key type, so >> that you could write "boss: '123'" instead of specifying the whole object. >> Should I be making an additional GET request for this case? >> >> Regards, >> Maria >> >> On Tue, 19 Jan 2021 at 08:53, John Mora wrote: >> >>> Hi Maria, >>> >>> Thanks for the update. >>> >>> Some comments: >>> >>> >>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 >>> >>> Please add the index mappings when you create the elasticsearch index. >>> >>> >>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings >>> >>> You can use the Field mappings parsed from the XML file. >>> >>> >>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 >>> >>> Regarding your question, Elasticsearch supports complex datatypes: >>> >>> >>> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html >>> >>> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html >>> >>> You can use the RethinkDB datastore as an example and store recursively >>> the fields of the embedded objects. >>> >>> >>> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 >>> >>> Give it a try first and let me know if you get stuck. >>> >>> Alternatively, if the first option is not feasible, you can serialize >>> the embedded objects as byte array, example: >>> >>> >>> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 >>> >>> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html >>> >>> Best regards, >>> John. >>> >>> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< >>> podorvanova.ma...@gmail.com>) escribió: >>> Hi, Report #7 Period: January 10 - January 16 Activities: - Fixed authentication [1]: 1. Set up password to Elasticsearch container properly 2. Set default Elasticsearch container server’s username in gora.properties 3. Added exceptions for missing arguments in authentication - Added a parameter for the XSD validation [2]: 1. Defined a parameter for the XSD validation 2. Added a test case for the parameter 3. Made ElasticsearchStore read mapping file from properties, not configuration - Implemented some basic Input-Output operations for schema management [3]: 1. Implemented delete, get and put methods 2. Implemented newInstance and getUnionSchema utility methods 3. Implemented basic serialization/deserialization for primitive AVRO types Here are links to the commits: [1] https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b [2] https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 [3] https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 This week I have started work on serialization/deserialization. While testing get method I found that UNION case could be a combination of NULL, STRING or another RECORD for external table references (e.g. boss for Employee). Could you explain to me what I should do in this case? I see two possible cases here: 1) Do deserialize recursively if the field value is a RECORD 2) Make another request for STRING case, where I have only key for the external object. Regards, Maria >>>
Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi Maria, Sorry for the late reply. Let's keep it simple.You can throw an exception when you receive a STRING and only process RECORD cases in UNION. Example: https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349 Regards, John El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (< podorvanova.ma...@gmail.com>) escribió: > Hi > > Thank you for your comments. > > I will take a look into your links, but my question was a bit different. > The problem is that foreign key "boss" is represented in Avro as UNION of > three types: STRING, NULL and RECORD. Your answer is in regards to how to > handle the last case (RECORD), but I was asking about how to handle > the STRING case. AFAIU STRING refers to the Employee's primary key type, so > that you could write "boss: '123'" instead of specifying the whole object. > Should I be making an additional GET request for this case? > > Regards, > Maria > > On Tue, 19 Jan 2021 at 08:53, John Mora wrote: > >> Hi Maria, >> >> Thanks for the update. >> >> Some comments: >> >> >> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 >> >> Please add the index mappings when you create the elasticsearch index. >> >> >> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings >> >> You can use the Field mappings parsed from the XML file. >> >> >> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 >> >> Regarding your question, Elasticsearch supports complex datatypes: >> >> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html >> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html >> >> You can use the RethinkDB datastore as an example and store recursively >> the fields of the embedded objects. >> >> >> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 >> >> Give it a try first and let me know if you get stuck. >> >> Alternatively, if the first option is not feasible, you can serialize the >> embedded objects as byte array, example: >> >> >> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 >> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html >> >> Best regards, >> John. >> >> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< >> podorvanova.ma...@gmail.com>) escribió: >> >>> Hi, >>> >>> Report #7 >>> Period: January 10 - January 16 >>> Activities: >>> - Fixed authentication [1]: >>> >>>1. Set up password to Elasticsearch container properly >>>2. Set default Elasticsearch container server’s username in >>>gora.properties >>>3. Added exceptions for missing arguments in authentication >>> >>> - Added a parameter for the XSD validation [2]: >>> >>>1. Defined a parameter for the XSD validation >>>2. Added a test case for the parameter >>>3. Made ElasticsearchStore read mapping file from properties, not >>>configuration >>> >>> - Implemented some basic Input-Output operations for schema management >>> [3]: >>> >>>1. Implemented delete, get and put methods >>>2. Implemented newInstance and getUnionSchema utility methods >>>3. Implemented basic serialization/deserialization for primitive >>>AVRO types >>> >>> >>> Here are links to the commits: >>> [1] >>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b >>> [2] >>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 >>> [3] >>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 >>> >>> This week I have started work on serialization/deserialization. While >>> testing get method I found that UNION case could be a combination of NULL, >>> STRING or another RECORD for external table references (e.g. boss for >>> Employee). Could you explain to me what I should do in this case? I see two >>> possible cases here: 1) Do deserialize recursively if the field value is a >>> RECORD 2) Make another request for STRING case, where I have only key for >>> the external object. >>> >>> Regards, >>> Maria >>> >>
Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi Thank you for your comments. I will take a look into your links, but my question was a bit different. The problem is that foreign key "boss" is represented in Avro as UNION of three types: STRING, NULL and RECORD. Your answer is in regards to how to handle the last case (RECORD), but I was asking about how to handle the STRING case. AFAIU STRING refers to the Employee's primary key type, so that you could write "boss: '123'" instead of specifying the whole object. Should I be making an additional GET request for this case? Regards, Maria On Tue, 19 Jan 2021 at 08:53, John Mora wrote: > Hi Maria, > > Thanks for the update. > > Some comments: > > > https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 > > Please add the index mappings when you create the elasticsearch index. > > > https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings > > You can use the Field mappings parsed from the XML file. > > > https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 > > Regarding your question, Elasticsearch supports complex datatypes: > > https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html > https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html > > You can use the RethinkDB datastore as an example and store recursively > the fields of the embedded objects. > > > https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 > > Give it a try first and let me know if you get stuck. > > Alternatively, if the first option is not feasible, you can serialize the > embedded objects as byte array, example: > > > https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 > https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html > > Best regards, > John. > > El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< > podorvanova.ma...@gmail.com>) escribió: > >> Hi, >> >> Report #7 >> Period: January 10 - January 16 >> Activities: >> - Fixed authentication [1]: >> >>1. Set up password to Elasticsearch container properly >>2. Set default Elasticsearch container server’s username in >>gora.properties >>3. Added exceptions for missing arguments in authentication >> >> - Added a parameter for the XSD validation [2]: >> >>1. Defined a parameter for the XSD validation >>2. Added a test case for the parameter >>3. Made ElasticsearchStore read mapping file from properties, not >>configuration >> >> - Implemented some basic Input-Output operations for schema management >> [3]: >> >>1. Implemented delete, get and put methods >>2. Implemented newInstance and getUnionSchema utility methods >>3. Implemented basic serialization/deserialization for primitive AVRO >>types >> >> >> Here are links to the commits: >> [1] >> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b >> [2] >> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 >> [3] >> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 >> >> This week I have started work on serialization/deserialization. While >> testing get method I found that UNION case could be a combination of NULL, >> STRING or another RECORD for external table references (e.g. boss for >> Employee). Could you explain to me what I should do in this case? I see two >> possible cases here: 1) Do deserialize recursively if the field value is a >> RECORD 2) Make another request for STRING case, where I have only key for >> the external object. >> >> Regards, >> Maria >> >
Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi Maria, Thanks for the update. Some comments: https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 Please add the index mappings when you create the elasticsearch index. https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings You can use the Field mappings parsed from the XML file. https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 Regarding your question, Elasticsearch supports complex datatypes: https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html You can use the RethinkDB datastore as an example and store recursively the fields of the embedded objects. https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 Give it a try first and let me know if you get stuck. Alternatively, if the first option is not feasible, you can serialize the embedded objects as byte array, example: https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html Best regards, John. El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< podorvanova.ma...@gmail.com>) escribió: > Hi, > > Report #7 > Period: January 10 - January 16 > Activities: > - Fixed authentication [1]: > >1. Set up password to Elasticsearch container properly >2. Set default Elasticsearch container server’s username in >gora.properties >3. Added exceptions for missing arguments in authentication > > - Added a parameter for the XSD validation [2]: > >1. Defined a parameter for the XSD validation >2. Added a test case for the parameter >3. Made ElasticsearchStore read mapping file from properties, not >configuration > > - Implemented some basic Input-Output operations for schema management [3]: > >1. Implemented delete, get and put methods >2. Implemented newInstance and getUnionSchema utility methods >3. Implemented basic serialization/deserialization for primitive AVRO >types > > > Here are links to the commits: > [1] > https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b > [2] > https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 > [3] > https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 > > This week I have started work on serialization/deserialization. While > testing get method I found that UNION case could be a combination of NULL, > STRING or another RECORD for external table references (e.g. boss for > Employee). Could you explain to me what I should do in this case? I see two > possible cases here: 1) Do deserialize recursively if the field value is a > RECORD 2) Make another request for STRING case, where I have only key for > the external object. > > Regards, > Maria >
Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi, Report #7 Period: January 10 - January 16 Activities: - Fixed authentication [1]: 1. Set up password to Elasticsearch container properly 2. Set default Elasticsearch container server’s username in gora.properties 3. Added exceptions for missing arguments in authentication - Added a parameter for the XSD validation [2]: 1. Defined a parameter for the XSD validation 2. Added a test case for the parameter 3. Made ElasticsearchStore read mapping file from properties, not configuration - Implemented some basic Input-Output operations for schema management [3]: 1. Implemented delete, get and put methods 2. Implemented newInstance and getUnionSchema utility methods 3. Implemented basic serialization/deserialization for primitive AVRO types Here are links to the commits: [1] https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b [2] https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 [3] https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 This week I have started work on serialization/deserialization. While testing get method I found that UNION case could be a combination of NULL, STRING or another RECORD for external table references (e.g. boss for Employee). Could you explain to me what I should do in this case? I see two possible cases here: 1) Do deserialize recursively if the field value is a RECORD 2) Make another request for STRING case, where I have only key for the external object. Regards, Maria
Re: Week 7 Report
Hey Sheriffo, Very nice work! I am sorry for the silence in the past weeks, but I have been swamped with things. I hope I can be of more help now. Anyway, regarding your progress reports, I have some questions: - Regarding using Google Cloud credits, did you see this? [1] Maybe that would also be something we could try. Although I am not sure how compatible/incompatible the required versions are. Maybe that could be an alternative of key-value store instead of others beside the ones you have picked so far. - Regarding the exception when creating very large objects to be serialized, what about using arrays of records from Avro? or maybe just arrays of primitive types? with that we could increase the size of the value and have an extra knob to try in the benchmarks. - Regarding, the last report where you have some plots, could you explain what you are plotting? e.g., what is on the x-axis? aggregated number of inserted keys? or numbers of keys inserted at a particular point in time? Overall, very nice work Sheriffo! Thanks for all the good work! Best, Renato M. [1] https://cloud.google.com/bigtable/docs/hbase-bigtable El dom., 14 jul. 2019 a las 16:33, Sheriffo Ceesay () escribió: > > Week seven report is available at > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > Basically, I am currently running workloads on HBase. I will continue to do > this for next week and probably the week after. More details are specified in > the report. > > Please let me know if you have any questions. > > > *Sheriffo Ceesay* >
Week 7 Report
Week seven report is available at https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report Basically, I am currently running workloads on HBase. I will continue to do this for next week and probably the week after. More details are specified in the report. Please let me know if you have any questions. **Sheriffo Ceesay**