Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-21 Thread Maria Podorvanova
Hi,

Okay, I will do that then. Thanks.

Regards,
Maria

On Thu, 21 Jan 2021 at 03:33, John Mora  wrote:

> Hi Maria,
>
> Sorry for the late reply. Let's keep it simple.You can throw an exception
> when you receive a STRING and only process RECORD cases in UNION.
>
> Example:
>
> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349
>
> Regards,
> John
>
> El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi
>>
>> Thank you for your comments.
>>
>> I will take a look into your links, but my question was a bit different.
>> The problem is that foreign key "boss" is represented in Avro as UNION of
>> three types: STRING, NULL and RECORD. Your answer is in regards to how to
>> handle the last case (RECORD), but I was asking about how to handle
>> the STRING case. AFAIU STRING refers to the Employee's primary key type, so
>> that you could write "boss: '123'" instead of specifying the whole object.
>> Should I be making an additional GET request for this case?
>>
>> Regards,
>> Maria
>>
>> On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:
>>
>>> Hi Maria,
>>>
>>> Thanks for the update.
>>>
>>> Some comments:
>>>
>>>
>>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>>>
>>> Please add the index mappings when you create the elasticsearch index.
>>>
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>>>
>>> You can use the Field mappings parsed from the XML file.
>>>
>>>
>>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>>>
>>> Regarding your question, Elasticsearch supports complex datatypes:
>>>
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>>>
>>> You can use the RethinkDB datastore as an example and store recursively
>>> the fields of the embedded objects.
>>>
>>>
>>> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>>>
>>> Give it a try first and let me know if you get stuck.
>>>
>>> Alternatively, if the first option is not feasible, you can serialize
>>> the embedded objects as byte array, example:
>>>
>>>
>>> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>>>
>>> Best regards,
>>> John.
>>>
>>> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
>>> podorvanova.ma...@gmail.com>) escribió:
>>>
 Hi,

 Report #7
 Period: January 10 - January 16
 Activities:
 - Fixed authentication [1]:

1. Set up password to Elasticsearch container properly
2. Set default Elasticsearch container server’s username in
gora.properties
3. Added exceptions for missing arguments in authentication

 - Added a parameter for the XSD validation [2]:

1. Defined a parameter for the XSD validation
2. Added a test case for the parameter
3. Made ElasticsearchStore read mapping file from properties, not
configuration

 - Implemented some basic Input-Output operations for schema management
 [3]:

1. Implemented delete, get and put methods
2. Implemented newInstance and getUnionSchema utility methods
3. Implemented basic serialization/deserialization for primitive
AVRO types


 Here are links to the commits:
 [1]
 https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
 [2]
 https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
 [3]
 https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0

 This week I have started work on serialization/deserialization. While
 testing get method I found that UNION case could be a combination of NULL,
 STRING or another RECORD for external table references (e.g. boss for
 Employee). Could you explain to me what I should do in this case? I see two
 possible cases here: 1) Do deserialize recursively if the field value is a
 RECORD 2) Make another request for STRING case, where I have only key for
 the external object.

 Regards,
 Maria

>>>


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-20 Thread John Mora
Hi Maria,

Sorry for the late reply. Let's keep it simple.You can throw an exception
when you receive a STRING and only process RECORD cases in UNION.

Example:
https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349

Regards,
John

El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (<
podorvanova.ma...@gmail.com>) escribió:

> Hi
>
> Thank you for your comments.
>
> I will take a look into your links, but my question was a bit different.
> The problem is that foreign key "boss" is represented in Avro as UNION of
> three types: STRING, NULL and RECORD. Your answer is in regards to how to
> handle the last case (RECORD), but I was asking about how to handle
> the STRING case. AFAIU STRING refers to the Employee's primary key type, so
> that you could write "boss: '123'" instead of specifying the whole object.
> Should I be making an additional GET request for this case?
>
> Regards,
> Maria
>
> On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:
>
>> Hi Maria,
>>
>> Thanks for the update.
>>
>> Some comments:
>>
>>
>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>>
>> Please add the index mappings when you create the elasticsearch index.
>>
>>
>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>>
>> You can use the Field mappings parsed from the XML file.
>>
>>
>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>>
>> Regarding your question, Elasticsearch supports complex datatypes:
>>
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>>
>> You can use the RethinkDB datastore as an example and store recursively
>> the fields of the embedded objects.
>>
>>
>> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>>
>> Give it a try first and let me know if you get stuck.
>>
>> Alternatively, if the first option is not feasible, you can serialize the
>> embedded objects as byte array, example:
>>
>>
>> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>>
>> Best regards,
>> John.
>>
>> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
>> podorvanova.ma...@gmail.com>) escribió:
>>
>>> Hi,
>>>
>>> Report #7
>>> Period: January 10 - January 16
>>> Activities:
>>> - Fixed authentication [1]:
>>>
>>>1. Set up password to Elasticsearch container properly
>>>2. Set default Elasticsearch container server’s username in
>>>gora.properties
>>>3. Added exceptions for missing arguments in authentication
>>>
>>> - Added a parameter for the XSD validation [2]:
>>>
>>>1. Defined a parameter for the XSD validation
>>>2. Added a test case for the parameter
>>>3. Made ElasticsearchStore read mapping file from properties, not
>>>configuration
>>>
>>> - Implemented some basic Input-Output operations for schema management
>>> [3]:
>>>
>>>1. Implemented delete, get and put methods
>>>2. Implemented newInstance and getUnionSchema utility methods
>>>3. Implemented basic serialization/deserialization for primitive
>>>AVRO types
>>>
>>>
>>> Here are links to the commits:
>>> [1]
>>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
>>> [2]
>>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
>>> [3]
>>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>>>
>>> This week I have started work on serialization/deserialization. While
>>> testing get method I found that UNION case could be a combination of NULL,
>>> STRING or another RECORD for external table references (e.g. boss for
>>> Employee). Could you explain to me what I should do in this case? I see two
>>> possible cases here: 1) Do deserialize recursively if the field value is a
>>> RECORD 2) Make another request for STRING case, where I have only key for
>>> the external object.
>>>
>>> Regards,
>>> Maria
>>>
>>


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-19 Thread Maria Podorvanova
Hi

Thank you for your comments.

I will take a look into your links, but my question was a bit different.
The problem is that foreign key "boss" is represented in Avro as UNION of
three types: STRING, NULL and RECORD. Your answer is in regards to how to
handle the last case (RECORD), but I was asking about how to handle
the STRING case. AFAIU STRING refers to the Employee's primary key type, so
that you could write "boss: '123'" instead of specifying the whole object.
Should I be making an additional GET request for this case?

Regards,
Maria

On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:

> Hi Maria,
>
> Thanks for the update.
>
> Some comments:
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>
> Please add the index mappings when you create the elasticsearch index.
>
>
> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>
> You can use the Field mappings parsed from the XML file.
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>
> Regarding your question, Elasticsearch supports complex datatypes:
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>
> You can use the RethinkDB datastore as an example and store recursively
> the fields of the embedded objects.
>
>
> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>
> Give it a try first and let me know if you get stuck.
>
> Alternatively, if the first option is not feasible, you can serialize the
> embedded objects as byte array, example:
>
>
> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>
> Best regards,
> John.
>
> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #7
>> Period: January 10 - January 16
>> Activities:
>> - Fixed authentication [1]:
>>
>>1. Set up password to Elasticsearch container properly
>>2. Set default Elasticsearch container server’s username in
>>gora.properties
>>3. Added exceptions for missing arguments in authentication
>>
>> - Added a parameter for the XSD validation [2]:
>>
>>1. Defined a parameter for the XSD validation
>>2. Added a test case for the parameter
>>3. Made ElasticsearchStore read mapping file from properties, not
>>configuration
>>
>> - Implemented some basic Input-Output operations for schema management
>> [3]:
>>
>>1. Implemented delete, get and put methods
>>2. Implemented newInstance and getUnionSchema utility methods
>>3. Implemented basic serialization/deserialization for primitive AVRO
>>types
>>
>>
>> Here are links to the commits:
>> [1]
>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
>> [2]
>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
>> [3]
>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>>
>> This week I have started work on serialization/deserialization. While
>> testing get method I found that UNION case could be a combination of NULL,
>> STRING or another RECORD for external table references (e.g. boss for
>> Employee). Could you explain to me what I should do in this case? I see two
>> possible cases here: 1) Do deserialize recursively if the field value is a
>> RECORD 2) Make another request for STRING case, where I have only key for
>> the external object.
>>
>> Regards,
>> Maria
>>
>


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-18 Thread John Mora
Hi Maria,

Thanks for the update.

Some comments:

https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192

Please add the index mappings when you create the elasticsearch index.

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings

You can use the Field mappings parsed from the XML file.

https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28

Regarding your question, Elasticsearch supports complex datatypes:

https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

You can use the RethinkDB datastore as an example and store recursively the
fields of the embedded objects.

https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448

Give it a try first and let me know if you get stuck.

Alternatively, if the first option is not feasible, you can serialize the
embedded objects as byte array, example:

https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html

Best regards,
John.

El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
podorvanova.ma...@gmail.com>) escribió:

> Hi,
>
> Report #7
> Period: January 10 - January 16
> Activities:
> - Fixed authentication [1]:
>
>1. Set up password to Elasticsearch container properly
>2. Set default Elasticsearch container server’s username in
>gora.properties
>3. Added exceptions for missing arguments in authentication
>
> - Added a parameter for the XSD validation [2]:
>
>1. Defined a parameter for the XSD validation
>2. Added a test case for the parameter
>3. Made ElasticsearchStore read mapping file from properties, not
>configuration
>
> - Implemented some basic Input-Output operations for schema management [3]:
>
>1. Implemented delete, get and put methods
>2. Implemented newInstance and getUnionSchema utility methods
>3. Implemented basic serialization/deserialization for primitive AVRO
>types
>
>
> Here are links to the commits:
> [1]
> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
> [2]
> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
> [3]
> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>
> This week I have started work on serialization/deserialization. While
> testing get method I found that UNION case could be a combination of NULL,
> STRING or another RECORD for external table references (e.g. boss for
> Employee). Could you explain to me what I should do in this case? I see two
> possible cases here: 1) Do deserialize recursively if the field value is a
> RECORD 2) Make another request for STRING case, where I have only key for
> the external object.
>
> Regards,
> Maria
>


Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-16 Thread Maria Podorvanova
Hi,

Report #7
Period: January 10 - January 16
Activities:
- Fixed authentication [1]:

   1. Set up password to Elasticsearch container properly
   2. Set default Elasticsearch container server’s username in
   gora.properties
   3. Added exceptions for missing arguments in authentication

- Added a parameter for the XSD validation [2]:

   1. Defined a parameter for the XSD validation
   2. Added a test case for the parameter
   3. Made ElasticsearchStore read mapping file from properties, not
   configuration

- Implemented some basic Input-Output operations for schema management [3]:

   1. Implemented delete, get and put methods
   2. Implemented newInstance and getUnionSchema utility methods
   3. Implemented basic serialization/deserialization for primitive AVRO
   types


Here are links to the commits:
[1]
https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
[2]
https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
[3]
https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0

This week I have started work on serialization/deserialization. While
testing get method I found that UNION case could be a combination of NULL,
STRING or another RECORD for external table references (e.g. boss for
Employee). Could you explain to me what I should do in this case? I see two
possible cases here: 1) Do deserialize recursively if the field value is a
RECORD 2) Make another request for STRING case, where I have only key for
the external object.

Regards,
Maria