Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi, Okay, I will do that then. Thanks. Regards, Maria On Thu, 21 Jan 2021 at 03:33, John Mora wrote: > Hi Maria, > > Sorry for the late reply. Let's keep it simple.You can throw an exception > when you receive a STRING and only process RECORD cases in UNION. > > Example: > > https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349 > > Regards, > John > > El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (< > podorvanova.ma...@gmail.com>) escribió: > >> Hi >> >> Thank you for your comments. >> >> I will take a look into your links, but my question was a bit different. >> The problem is that foreign key "boss" is represented in Avro as UNION of >> three types: STRING, NULL and RECORD. Your answer is in regards to how to >> handle the last case (RECORD), but I was asking about how to handle >> the STRING case. AFAIU STRING refers to the Employee's primary key type, so >> that you could write "boss: '123'" instead of specifying the whole object. >> Should I be making an additional GET request for this case? >> >> Regards, >> Maria >> >> On Tue, 19 Jan 2021 at 08:53, John Mora wrote: >> >>> Hi Maria, >>> >>> Thanks for the update. >>> >>> Some comments: >>> >>> >>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 >>> >>> Please add the index mappings when you create the elasticsearch index. >>> >>> >>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings >>> >>> You can use the Field mappings parsed from the XML file. >>> >>> >>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 >>> >>> Regarding your question, Elasticsearch supports complex datatypes: >>> >>> >>> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html >>> >>> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html >>> >>> You can use the RethinkDB datastore as an example and store recursively >>> the fields of the embedded objects. >>> >>> >>> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 >>> >>> Give it a try first and let me know if you get stuck. >>> >>> Alternatively, if the first option is not feasible, you can serialize >>> the embedded objects as byte array, example: >>> >>> >>> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 >>> >>> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html >>> >>> Best regards, >>> John. >>> >>> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< >>> podorvanova.ma...@gmail.com>) escribió: >>> Hi, Report #7 Period: January 10 - January 16 Activities: - Fixed authentication [1]: 1. Set up password to Elasticsearch container properly 2. Set default Elasticsearch container server’s username in gora.properties 3. Added exceptions for missing arguments in authentication - Added a parameter for the XSD validation [2]: 1. Defined a parameter for the XSD validation 2. Added a test case for the parameter 3. Made ElasticsearchStore read mapping file from properties, not configuration - Implemented some basic Input-Output operations for schema management [3]: 1. Implemented delete, get and put methods 2. Implemented newInstance and getUnionSchema utility methods 3. Implemented basic serialization/deserialization for primitive AVRO types Here are links to the commits: [1] https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b [2] https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 [3] https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 This week I have started work on serialization/deserialization. While testing get method I found that UNION case could be a combination of NULL, STRING or another RECORD for external table references (e.g. boss for Employee). Could you explain to me what I should do in this case? I see two possible cases here: 1) Do deserialize recursively if the field value is a RECORD 2) Make another request for STRING case, where I have only key for the external object. Regards, Maria >>>
Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi Maria, Sorry for the late reply. Let's keep it simple.You can throw an exception when you receive a STRING and only process RECORD cases in UNION. Example: https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349 Regards, John El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (< podorvanova.ma...@gmail.com>) escribió: > Hi > > Thank you for your comments. > > I will take a look into your links, but my question was a bit different. > The problem is that foreign key "boss" is represented in Avro as UNION of > three types: STRING, NULL and RECORD. Your answer is in regards to how to > handle the last case (RECORD), but I was asking about how to handle > the STRING case. AFAIU STRING refers to the Employee's primary key type, so > that you could write "boss: '123'" instead of specifying the whole object. > Should I be making an additional GET request for this case? > > Regards, > Maria > > On Tue, 19 Jan 2021 at 08:53, John Mora wrote: > >> Hi Maria, >> >> Thanks for the update. >> >> Some comments: >> >> >> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 >> >> Please add the index mappings when you create the elasticsearch index. >> >> >> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings >> >> You can use the Field mappings parsed from the XML file. >> >> >> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 >> >> Regarding your question, Elasticsearch supports complex datatypes: >> >> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html >> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html >> >> You can use the RethinkDB datastore as an example and store recursively >> the fields of the embedded objects. >> >> >> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 >> >> Give it a try first and let me know if you get stuck. >> >> Alternatively, if the first option is not feasible, you can serialize the >> embedded objects as byte array, example: >> >> >> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 >> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html >> >> Best regards, >> John. >> >> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< >> podorvanova.ma...@gmail.com>) escribió: >> >>> Hi, >>> >>> Report #7 >>> Period: January 10 - January 16 >>> Activities: >>> - Fixed authentication [1]: >>> >>>1. Set up password to Elasticsearch container properly >>>2. Set default Elasticsearch container server’s username in >>>gora.properties >>>3. Added exceptions for missing arguments in authentication >>> >>> - Added a parameter for the XSD validation [2]: >>> >>>1. Defined a parameter for the XSD validation >>>2. Added a test case for the parameter >>>3. Made ElasticsearchStore read mapping file from properties, not >>>configuration >>> >>> - Implemented some basic Input-Output operations for schema management >>> [3]: >>> >>>1. Implemented delete, get and put methods >>>2. Implemented newInstance and getUnionSchema utility methods >>>3. Implemented basic serialization/deserialization for primitive >>>AVRO types >>> >>> >>> Here are links to the commits: >>> [1] >>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b >>> [2] >>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 >>> [3] >>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 >>> >>> This week I have started work on serialization/deserialization. While >>> testing get method I found that UNION case could be a combination of NULL, >>> STRING or another RECORD for external table references (e.g. boss for >>> Employee). Could you explain to me what I should do in this case? I see two >>> possible cases here: 1) Do deserialize recursively if the field value is a >>> RECORD 2) Make another request for STRING case, where I have only key for >>> the external object. >>> >>> Regards, >>> Maria >>> >>
Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi Thank you for your comments. I will take a look into your links, but my question was a bit different. The problem is that foreign key "boss" is represented in Avro as UNION of three types: STRING, NULL and RECORD. Your answer is in regards to how to handle the last case (RECORD), but I was asking about how to handle the STRING case. AFAIU STRING refers to the Employee's primary key type, so that you could write "boss: '123'" instead of specifying the whole object. Should I be making an additional GET request for this case? Regards, Maria On Tue, 19 Jan 2021 at 08:53, John Mora wrote: > Hi Maria, > > Thanks for the update. > > Some comments: > > > https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 > > Please add the index mappings when you create the elasticsearch index. > > > https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings > > You can use the Field mappings parsed from the XML file. > > > https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 > > Regarding your question, Elasticsearch supports complex datatypes: > > https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html > https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html > > You can use the RethinkDB datastore as an example and store recursively > the fields of the embedded objects. > > > https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 > > Give it a try first and let me know if you get stuck. > > Alternatively, if the first option is not feasible, you can serialize the > embedded objects as byte array, example: > > > https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 > https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html > > Best regards, > John. > > El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< > podorvanova.ma...@gmail.com>) escribió: > >> Hi, >> >> Report #7 >> Period: January 10 - January 16 >> Activities: >> - Fixed authentication [1]: >> >>1. Set up password to Elasticsearch container properly >>2. Set default Elasticsearch container server’s username in >>gora.properties >>3. Added exceptions for missing arguments in authentication >> >> - Added a parameter for the XSD validation [2]: >> >>1. Defined a parameter for the XSD validation >>2. Added a test case for the parameter >>3. Made ElasticsearchStore read mapping file from properties, not >>configuration >> >> - Implemented some basic Input-Output operations for schema management >> [3]: >> >>1. Implemented delete, get and put methods >>2. Implemented newInstance and getUnionSchema utility methods >>3. Implemented basic serialization/deserialization for primitive AVRO >>types >> >> >> Here are links to the commits: >> [1] >> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b >> [2] >> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 >> [3] >> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 >> >> This week I have started work on serialization/deserialization. While >> testing get method I found that UNION case could be a combination of NULL, >> STRING or another RECORD for external table references (e.g. boss for >> Employee). Could you explain to me what I should do in this case? I see two >> possible cases here: 1) Do deserialize recursively if the field value is a >> RECORD 2) Make another request for STRING case, where I have only key for >> the external object. >> >> Regards, >> Maria >> >
Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi Maria, Thanks for the update. Some comments: https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192 Please add the index mappings when you create the elasticsearch index. https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings You can use the Field mappings parsed from the XML file. https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28 Regarding your question, Elasticsearch supports complex datatypes: https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html You can use the RethinkDB datastore as an example and store recursively the fields of the embedded objects. https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448 Give it a try first and let me know if you get stuck. Alternatively, if the first option is not feasible, you can serialize the embedded objects as byte array, example: https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735 https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html Best regards, John. El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (< podorvanova.ma...@gmail.com>) escribió: > Hi, > > Report #7 > Period: January 10 - January 16 > Activities: > - Fixed authentication [1]: > >1. Set up password to Elasticsearch container properly >2. Set default Elasticsearch container server’s username in >gora.properties >3. Added exceptions for missing arguments in authentication > > - Added a parameter for the XSD validation [2]: > >1. Defined a parameter for the XSD validation >2. Added a test case for the parameter >3. Made ElasticsearchStore read mapping file from properties, not >configuration > > - Implemented some basic Input-Output operations for schema management [3]: > >1. Implemented delete, get and put methods >2. Implemented newInstance and getUnionSchema utility methods >3. Implemented basic serialization/deserialization for primitive AVRO >types > > > Here are links to the commits: > [1] > https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b > [2] > https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 > [3] > https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 > > This week I have started work on serialization/deserialization. While > testing get method I found that UNION case could be a combination of NULL, > STRING or another RECORD for external table references (e.g. boss for > Employee). Could you explain to me what I should do in this case? I see two > possible cases here: 1) Do deserialize recursively if the field value is a > RECORD 2) Make another request for STRING case, where I have only key for > the external object. > > Regards, > Maria >
Add datastore for Elasticsearch. Outreachy Week 7 Report
Hi, Report #7 Period: January 10 - January 16 Activities: - Fixed authentication [1]: 1. Set up password to Elasticsearch container properly 2. Set default Elasticsearch container server’s username in gora.properties 3. Added exceptions for missing arguments in authentication - Added a parameter for the XSD validation [2]: 1. Defined a parameter for the XSD validation 2. Added a test case for the parameter 3. Made ElasticsearchStore read mapping file from properties, not configuration - Implemented some basic Input-Output operations for schema management [3]: 1. Implemented delete, get and put methods 2. Implemented newInstance and getUnionSchema utility methods 3. Implemented basic serialization/deserialization for primitive AVRO types Here are links to the commits: [1] https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b [2] https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43 [3] https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0 This week I have started work on serialization/deserialization. While testing get method I found that UNION case could be a combination of NULL, STRING or another RECORD for external table references (e.g. boss for Employee). Could you explain to me what I should do in this case? I see two possible cases here: 1) Do deserialize recursively if the field value is a RECORD 2) Make another request for STRING case, where I have only key for the external object. Regards, Maria