Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-21 Thread Maria Podorvanova
Hi,

Okay, I will do that then. Thanks.

Regards,
Maria

On Thu, 21 Jan 2021 at 03:33, John Mora  wrote:

> Hi Maria,
>
> Sorry for the late reply. Let's keep it simple.You can throw an exception
> when you receive a STRING and only process RECORD cases in UNION.
>
> Example:
>
> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349
>
> Regards,
> John
>
> El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi
>>
>> Thank you for your comments.
>>
>> I will take a look into your links, but my question was a bit different.
>> The problem is that foreign key "boss" is represented in Avro as UNION of
>> three types: STRING, NULL and RECORD. Your answer is in regards to how to
>> handle the last case (RECORD), but I was asking about how to handle
>> the STRING case. AFAIU STRING refers to the Employee's primary key type, so
>> that you could write "boss: '123'" instead of specifying the whole object.
>> Should I be making an additional GET request for this case?
>>
>> Regards,
>> Maria
>>
>> On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:
>>
>>> Hi Maria,
>>>
>>> Thanks for the update.
>>>
>>> Some comments:
>>>
>>>
>>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>>>
>>> Please add the index mappings when you create the elasticsearch index.
>>>
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>>>
>>> You can use the Field mappings parsed from the XML file.
>>>
>>>
>>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>>>
>>> Regarding your question, Elasticsearch supports complex datatypes:
>>>
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>>>
>>> You can use the RethinkDB datastore as an example and store recursively
>>> the fields of the embedded objects.
>>>
>>>
>>> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>>>
>>> Give it a try first and let me know if you get stuck.
>>>
>>> Alternatively, if the first option is not feasible, you can serialize
>>> the embedded objects as byte array, example:
>>>
>>>
>>> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>>>
>>> Best regards,
>>> John.
>>>
>>> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
>>> podorvanova.ma...@gmail.com>) escribió:
>>>
 Hi,

 Report #7
 Period: January 10 - January 16
 Activities:
 - Fixed authentication [1]:

1. Set up password to Elasticsearch container properly
2. Set default Elasticsearch container server’s username in
gora.properties
3. Added exceptions for missing arguments in authentication

 - Added a parameter for the XSD validation [2]:

1. Defined a parameter for the XSD validation
2. Added a test case for the parameter
3. Made ElasticsearchStore read mapping file from properties, not
configuration

 - Implemented some basic Input-Output operations for schema management
 [3]:

1. Implemented delete, get and put methods
2. Implemented newInstance and getUnionSchema utility methods
3. Implemented basic serialization/deserialization for primitive
AVRO types


 Here are links to the commits:
 [1]
 https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
 [2]
 https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
 [3]
 https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0

 This week I have started work on serialization/deserialization. While
 testing get method I found that UNION case could be a combination of NULL,
 STRING or another RECORD for external table references (e.g. boss for
 Employee). Could you explain to me what I should do in this case? I see two
 possible cases here: 1) Do deserialize recursively if the field value is a
 RECORD 2) Make another request for STRING case, where I have only key for
 the external object.

 Regards,
 Maria

>>>


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-20 Thread John Mora
Hi Maria,

Sorry for the late reply. Let's keep it simple.You can throw an exception
when you receive a STRING and only process RECORD cases in UNION.

Example:
https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349

Regards,
John

El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (<
podorvanova.ma...@gmail.com>) escribió:

> Hi
>
> Thank you for your comments.
>
> I will take a look into your links, but my question was a bit different.
> The problem is that foreign key "boss" is represented in Avro as UNION of
> three types: STRING, NULL and RECORD. Your answer is in regards to how to
> handle the last case (RECORD), but I was asking about how to handle
> the STRING case. AFAIU STRING refers to the Employee's primary key type, so
> that you could write "boss: '123'" instead of specifying the whole object.
> Should I be making an additional GET request for this case?
>
> Regards,
> Maria
>
> On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:
>
>> Hi Maria,
>>
>> Thanks for the update.
>>
>> Some comments:
>>
>>
>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>>
>> Please add the index mappings when you create the elasticsearch index.
>>
>>
>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>>
>> You can use the Field mappings parsed from the XML file.
>>
>>
>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>>
>> Regarding your question, Elasticsearch supports complex datatypes:
>>
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>>
>> You can use the RethinkDB datastore as an example and store recursively
>> the fields of the embedded objects.
>>
>>
>> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>>
>> Give it a try first and let me know if you get stuck.
>>
>> Alternatively, if the first option is not feasible, you can serialize the
>> embedded objects as byte array, example:
>>
>>
>> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
>>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>>
>> Best regards,
>> John.
>>
>> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
>> podorvanova.ma...@gmail.com>) escribió:
>>
>>> Hi,
>>>
>>> Report #7
>>> Period: January 10 - January 16
>>> Activities:
>>> - Fixed authentication [1]:
>>>
>>>1. Set up password to Elasticsearch container properly
>>>2. Set default Elasticsearch container server’s username in
>>>gora.properties
>>>3. Added exceptions for missing arguments in authentication
>>>
>>> - Added a parameter for the XSD validation [2]:
>>>
>>>1. Defined a parameter for the XSD validation
>>>2. Added a test case for the parameter
>>>3. Made ElasticsearchStore read mapping file from properties, not
>>>configuration
>>>
>>> - Implemented some basic Input-Output operations for schema management
>>> [3]:
>>>
>>>1. Implemented delete, get and put methods
>>>2. Implemented newInstance and getUnionSchema utility methods
>>>3. Implemented basic serialization/deserialization for primitive
>>>AVRO types
>>>
>>>
>>> Here are links to the commits:
>>> [1]
>>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
>>> [2]
>>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
>>> [3]
>>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>>>
>>> This week I have started work on serialization/deserialization. While
>>> testing get method I found that UNION case could be a combination of NULL,
>>> STRING or another RECORD for external table references (e.g. boss for
>>> Employee). Could you explain to me what I should do in this case? I see two
>>> possible cases here: 1) Do deserialize recursively if the field value is a
>>> RECORD 2) Make another request for STRING case, where I have only key for
>>> the external object.
>>>
>>> Regards,
>>> Maria
>>>
>>


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-19 Thread Maria Podorvanova
Hi

Thank you for your comments.

I will take a look into your links, but my question was a bit different.
The problem is that foreign key "boss" is represented in Avro as UNION of
three types: STRING, NULL and RECORD. Your answer is in regards to how to
handle the last case (RECORD), but I was asking about how to handle
the STRING case. AFAIU STRING refers to the Employee's primary key type, so
that you could write "boss: '123'" instead of specifying the whole object.
Should I be making an additional GET request for this case?

Regards,
Maria

On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:

> Hi Maria,
>
> Thanks for the update.
>
> Some comments:
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>
> Please add the index mappings when you create the elasticsearch index.
>
>
> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>
> You can use the Field mappings parsed from the XML file.
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>
> Regarding your question, Elasticsearch supports complex datatypes:
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>
> You can use the RethinkDB datastore as an example and store recursively
> the fields of the embedded objects.
>
>
> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>
> Give it a try first and let me know if you get stuck.
>
> Alternatively, if the first option is not feasible, you can serialize the
> embedded objects as byte array, example:
>
>
> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>
> Best regards,
> John.
>
> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #7
>> Period: January 10 - January 16
>> Activities:
>> - Fixed authentication [1]:
>>
>>1. Set up password to Elasticsearch container properly
>>2. Set default Elasticsearch container server’s username in
>>gora.properties
>>3. Added exceptions for missing arguments in authentication
>>
>> - Added a parameter for the XSD validation [2]:
>>
>>1. Defined a parameter for the XSD validation
>>2. Added a test case for the parameter
>>3. Made ElasticsearchStore read mapping file from properties, not
>>configuration
>>
>> - Implemented some basic Input-Output operations for schema management
>> [3]:
>>
>>1. Implemented delete, get and put methods
>>2. Implemented newInstance and getUnionSchema utility methods
>>3. Implemented basic serialization/deserialization for primitive AVRO
>>types
>>
>>
>> Here are links to the commits:
>> [1]
>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
>> [2]
>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
>> [3]
>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>>
>> This week I have started work on serialization/deserialization. While
>> testing get method I found that UNION case could be a combination of NULL,
>> STRING or another RECORD for external table references (e.g. boss for
>> Employee). Could you explain to me what I should do in this case? I see two
>> possible cases here: 1) Do deserialize recursively if the field value is a
>> RECORD 2) Make another request for STRING case, where I have only key for
>> the external object.
>>
>> Regards,
>> Maria
>>
>


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-18 Thread John Mora
Hi Maria,

Thanks for the update.

Some comments:

https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192

Please add the index mappings when you create the elasticsearch index.

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings

You can use the Field mappings parsed from the XML file.

https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28

Regarding your question, Elasticsearch supports complex datatypes:

https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

You can use the RethinkDB datastore as an example and store recursively the
fields of the embedded objects.

https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448

Give it a try first and let me know if you get stuck.

Alternatively, if the first option is not feasible, you can serialize the
embedded objects as byte array, example:

https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html

Best regards,
John.

El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
podorvanova.ma...@gmail.com>) escribió:

> Hi,
>
> Report #7
> Period: January 10 - January 16
> Activities:
> - Fixed authentication [1]:
>
>1. Set up password to Elasticsearch container properly
>2. Set default Elasticsearch container server’s username in
>gora.properties
>3. Added exceptions for missing arguments in authentication
>
> - Added a parameter for the XSD validation [2]:
>
>1. Defined a parameter for the XSD validation
>2. Added a test case for the parameter
>3. Made ElasticsearchStore read mapping file from properties, not
>configuration
>
> - Implemented some basic Input-Output operations for schema management [3]:
>
>1. Implemented delete, get and put methods
>2. Implemented newInstance and getUnionSchema utility methods
>3. Implemented basic serialization/deserialization for primitive AVRO
>types
>
>
> Here are links to the commits:
> [1]
> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
> [2]
> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
> [3]
> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>
> This week I have started work on serialization/deserialization. While
> testing get method I found that UNION case could be a combination of NULL,
> STRING or another RECORD for external table references (e.g. boss for
> Employee). Could you explain to me what I should do in this case? I see two
> possible cases here: 1) Do deserialize recursively if the field value is a
> RECORD 2) Make another request for STRING case, where I have only key for
> the external object.
>
> Regards,
> Maria
>


Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-16 Thread Maria Podorvanova
Hi,

Report #7
Period: January 10 - January 16
Activities:
- Fixed authentication [1]:

   1. Set up password to Elasticsearch container properly
   2. Set default Elasticsearch container server’s username in
   gora.properties
   3. Added exceptions for missing arguments in authentication

- Added a parameter for the XSD validation [2]:

   1. Defined a parameter for the XSD validation
   2. Added a test case for the parameter
   3. Made ElasticsearchStore read mapping file from properties, not
   configuration

- Implemented some basic Input-Output operations for schema management [3]:

   1. Implemented delete, get and put methods
   2. Implemented newInstance and getUnionSchema utility methods
   3. Implemented basic serialization/deserialization for primitive AVRO
   types


Here are links to the commits:
[1]
https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
[2]
https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
[3]
https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0

This week I have started work on serialization/deserialization. While
testing get method I found that UNION case could be a combination of NULL,
STRING or another RECORD for external table references (e.g. boss for
Employee). Could you explain to me what I should do in this case? I see two
possible cases here: 1) Do deserialize recursively if the field value is a
RECORD 2) Make another request for STRING case, where I have only key for
the external object.

Regards,
Maria


Re: Week 7 Report

2019-07-22 Thread Renato Marroquín Mogrovejo
Hey Sheriffo,

Very nice work! I am sorry for the silence in the past weeks, but I
have been swamped with things. I hope I can be of more help now.
Anyway, regarding your progress reports, I have some questions:
- Regarding using Google Cloud credits,  did you see this? [1] Maybe
that would also be something we could try. Although I am not sure how
compatible/incompatible the required versions are. Maybe that could be
an alternative of key-value store instead of others beside the ones
you have picked so far.
- Regarding the exception when creating very large objects to be
serialized, what about using arrays of records from Avro? or maybe
just arrays of primitive types? with that we could increase the size
of the value and have an extra knob to try in the benchmarks.
- Regarding, the last report where you have some plots, could you
explain what you are plotting? e.g., what is on the x-axis? aggregated
number of inserted keys? or numbers of keys inserted at a particular
point in time?
Overall, very nice work Sheriffo! Thanks for all the good work!


Best,

Renato M.


[1] https://cloud.google.com/bigtable/docs/hbase-bigtable

El dom., 14 jul. 2019 a las 16:33, Sheriffo Ceesay
() escribió:
>
> Week seven report is available at 
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>
> Basically, I am currently running workloads on HBase. I will continue to do 
> this for next week and probably the week after. More details are specified in 
> the report.
>
> Please let me know if you have any questions.
>
>
> *Sheriffo Ceesay*
>


Week 7 Report

2019-07-14 Thread Sheriffo Ceesay
Week seven report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

Basically, I am currently running workloads on HBase. I will continue to do
this for next week and probably the week after. More details are specified
in the report.

Please let me know if you have any questions.



**Sheriffo Ceesay**