Add datastore for Elasticsearch. Outreachy Week 13 Report

2021-03-01 Thread Maria Podorvanova
Hi,

Report #13
Week 13: February, 28 - March, 2
Activities:
- Submitted final feedback
- Posted last blog post
- Created a separate ticket[1] for Elasticsearch documentation for Apache
Gora website and attached a patch with my documentation
- Made a PR[2] with my code

Question:
CI build failed with weird errors, I am not sure what they are caused by.
Should I do something about it?

[1] https://issues.apache.org/jira/browse/GORA-670
[2] https://github.com/apache/gora/pull/234

Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 12 Report

2021-03-01 Thread Maria Podorvanova
Hi John,

Thank you for your response.

1) I have tried to execute a refresh call on the flush method and it is
working now. Thank you very much!

3) I see. I will leave it out for now then.

I will send a PR by the end of today.

Regards,
Maria

On Tue, 2 Mar 2021 at 09:33, John Mora  wrote:

> Hi Maria.
>
>
> Thanks for your update.
>
> 1) I made some experiments and I think you have to execute a refresh call
> on the flush() method.
> "An elasticsearch refresh
> <http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html>
> makes your documents available for search"
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
>
> Also, if you have problems with the order of the results check out the
> preference parameter
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-preference
>
> 3) Since the internship end is close and the Gora Explorer is an
> independent project (I am not sure if Alfonso has free time). I think we
> can skip that task, but it would be a nice post-outreachy contribution if
> you want.
>
> Please send a PR with your code for review.
>
> Thanks,
> John
>
> El lun, 1 mar 2021 a las 7:23, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi Madhawa,
>>
>> Thank you for your response. I will do that.
>>
>> Regards,
>> Maria
>>
>> On Mon, 1 Mar 2021 at 22:51, Madhawa Gunasekara 
>> wrote:
>>
>>> Hi Maria,
>>>
>>> 2) Documentation looks fine to me, please refer these to documentation
>>> Jira tickets as well. Let's stick to the same format.
>>> [1] https://issues.apache.org/jira/browse/GORA-625
>>> [2] https://issues.apache.org/jira/browse/GORA-338
>>>
>>> Please create a separate ticket for this documentation.
>>>
>>> Thanks,
>>> Madhawa
>>>
>>>
>>> On Sat, Feb 27, 2021 at 9:10 AM Maria Podorvanova <
>>> podorvanova.ma...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Report #12
>>>> Week 12: February, 21 - February, 27
>>>> Activities:
>>>> - Fixed execute method by adding a special "gora_id" field [1]
>>>> - Implemented deleting specific fields of the records in deleteByQuery
>>>> method [2]
>>>> - Implemented MapReduce test [3]
>>>> - Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
>>>> - All tests in TestElasticsearchStore are passing now
>>>> - I also had informal chat with 2 people this week
>>>>
>>>> Questions:
>>>>
>>>>1. The last commit [4] gives Elasticsearch some time to synchronize
>>>>all its replicas. Without Thread.sleep 10 tests (testQuery,
>>>>testQueryStartKey, testDeleteByQuery etc.) fail and return a different
>>>>number of hits every time I run them. I did not find a better solution, 
>>>> but
>>>>commit it anyway. Do you have any suggestions?
>>>>2. I did not get feedback about Elasticsearch documentation for
>>>>Apache Gora website I sent last week. Do I need to fix something in it?
>>>>3. One of the last goals of my internship is to add the new
>>>>datastore to the GoraExplorer project. Could you tell me if there is any
>>>>guide on how to do it?
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
>>>> [2]
>>>> https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
>>>> [3]
>>>> https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
>>>> [4]
>>>> https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca
>>>>
>>>> Regards,
>>>> Maria
>>>>
>>>


Re: Add datastore for Elasticsearch. Outreachy Week 12 Report

2021-03-01 Thread Maria Podorvanova
Hi Madhawa,

Thank you for your response. I will do that.

Regards,
Maria

On Mon, 1 Mar 2021 at 22:51, Madhawa Gunasekara  wrote:

> Hi Maria,
>
> 2) Documentation looks fine to me, please refer these to documentation
> Jira tickets as well. Let's stick to the same format.
> [1] https://issues.apache.org/jira/browse/GORA-625
> [2] https://issues.apache.org/jira/browse/GORA-338
>
> Please create a separate ticket for this documentation.
>
> Thanks,
> Madhawa
>
>
> On Sat, Feb 27, 2021 at 9:10 AM Maria Podorvanova <
> podorvanova.ma...@gmail.com> wrote:
>
>> Hi,
>>
>> Report #12
>> Week 12: February, 21 - February, 27
>> Activities:
>> - Fixed execute method by adding a special "gora_id" field [1]
>> - Implemented deleting specific fields of the records in deleteByQuery
>> method [2]
>> - Implemented MapReduce test [3]
>> - Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
>> - All tests in TestElasticsearchStore are passing now
>> - I also had informal chat with 2 people this week
>>
>> Questions:
>>
>>1. The last commit [4] gives Elasticsearch some time to synchronize
>>all its replicas. Without Thread.sleep 10 tests (testQuery,
>>testQueryStartKey, testDeleteByQuery etc.) fail and return a different
>>number of hits every time I run them. I did not find a better solution, 
>> but
>>commit it anyway. Do you have any suggestions?
>>2. I did not get feedback about Elasticsearch documentation for
>>Apache Gora website I sent last week. Do I need to fix something in it?
>>3. One of the last goals of my internship is to add the new datastore
>>to the GoraExplorer project. Could you tell me if there is any guide on 
>> how
>>to do it?
>>
>>
>> [1]
>> https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
>> [2]
>> https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
>> [3]
>> https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
>> [4]
>> https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca
>>
>> Regards,
>> Maria
>>
>


Add datastore for Elasticsearch. Outreachy Week 12 Report

2021-02-27 Thread Maria Podorvanova
Hi,

Report #12
Week 12: February, 21 - February, 27
Activities:
- Fixed execute method by adding a special "gora_id" field [1]
- Implemented deleting specific fields of the records in deleteByQuery
method [2]
- Implemented MapReduce test [3]
- Added Thread.sleep in order to synchronize Elasticsearch replicas [4]
- All tests in TestElasticsearchStore are passing now
- I also had informal chat with 2 people this week

Questions:

   1. The last commit [4] gives Elasticsearch some time to synchronize all
   its replicas. Without Thread.sleep 10 tests (testQuery, testQueryStartKey,
   testDeleteByQuery etc.) fail and return a different number of hits every
   time I run them. I did not find a better solution, but commit it anyway. Do
   you have any suggestions?
   2. I did not get feedback about Elasticsearch documentation for Apache
   Gora website I sent last week. Do I need to fix something in it?
   3. One of the last goals of my internship is to add the new datastore to
   the GoraExplorer project. Could you tell me if there is any guide on how to
   do it?


[1]
https://github.com/apache/gora/commit/f100b317a6dd3c98875f92de776e9b1e476e5425
[2]
https://github.com/apache/gora/commit/91fb2f83f7b4b682898b1cffe73eb8bebeb8ed83
[3]
https://github.com/apache/gora/commit/28b2dee779fa428f51f54585dbfb88638f9bc1de
[4]
https://github.com/apache/gora/commit/d7955f74821fad063da3dd9f1988f59aadbf7cca

Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 11 Report

2021-02-25 Thread Maria Podorvanova
Hi Kevin,

Yes, I will make a PR, once I fix some issues.

Regards,
Maria

On Thu, 25 Feb 2021 at 15:49, Kevin Ratnasekera 
wrote:

> Hi Maria,
>
> Thank you for hard work Maria. Can you raise a PR, once you are
> comfortable with changes?
>
> Regards
> Kevin
>
> On Thu, Feb 25, 2021 at 10:06 AM Maria Podorvanova <
> podorvanova.ma...@gmail.com> wrote:
>
>> Hi John,
>>
>> Thanks for your comment. I am working on it.
>>
>> Regards,
>> Maria
>>
>> On Wed, 24 Feb 2021 at 17:50, John Mora  wrote:
>>
>>> Hi Maria.
>>>
>>> Thanks for the update.
>>>
>>> Unfortunately, looping through all possible values in the range is not a
>>> practical solution.
>>>
>>> You should use the range query feature for this:
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
>>>
>>> I think you should manually add a special field in the elasticsearch
>>> record that you can range query (you can add it to the mapping file as a
>>> 'mock' primary key field). It will be basically a copy of the '_id' field.
>>>
>>> Here, you can find a similar workaround in the Redis DataStore where
>>> Sorted Sets were as secondary indexes for range queries.
>>>
>>>
>>> https://github.com/apache/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L299
>>>
>>> Best,
>>> John
>>>
>>> El sáb, 20 feb 2021 a las 3:01, Maria Podorvanova (<
>>> podorvanova.ma...@gmail.com>) escribió:
>>>
>>>> Hi,
>>>>
>>>> Report #11
>>>> Week 11: February, 14 - February, 20
>>>> Activities:
>>>> - Added scaling_factor support [1]
>>>> - Removed unsupported Elasticsearch data types [2]
>>>> - Implemented Metadata Analyzer for Elasticsearch Store [3]
>>>> - Tried to fix range query by “_id” field [4]
>>>> - Wrote documentation for Apache Gora website [5]
>>>> - Polished and sent my CV for reviewing
>>>>
>>>> Question:
>>>>
>>>>1. I tried to fix the issue, where Elasticsearch "_id" field does
>>>>not support range queries. I've tried treating "_id" as a number, but 
>>>> one
>>>>of the test "_id" field values is "http://foo.com/;. So
>>>>my approach did not work, but I decided to commit[4] my work on this 
>>>> issue
>>>>in order to show you what I tried to do.
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
>>>> [2]
>>>> https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
>>>> [3]
>>>> https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
>>>> [4]
>>>> https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
>>>> [5]
>>>> https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing
>>>>
>>>> Regards,
>>>> Maria
>>>>
>>>


Re: Add datastore for Elasticsearch. Outreachy Week 11 Report

2021-02-24 Thread Maria Podorvanova
Hi John,

Thanks for your comment. I am working on it.

Regards,
Maria

On Wed, 24 Feb 2021 at 17:50, John Mora  wrote:

> Hi Maria.
>
> Thanks for the update.
>
> Unfortunately, looping through all possible values in the range is not a
> practical solution.
>
> You should use the range query feature for this:
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
>
> I think you should manually add a special field in the elasticsearch
> record that you can range query (you can add it to the mapping file as a
> 'mock' primary key field). It will be basically a copy of the '_id' field.
>
> Here, you can find a similar workaround in the Redis DataStore where
> Sorted Sets were as secondary indexes for range queries.
>
>
> https://github.com/apache/gora/blob/master/gora-redis/src/main/java/org/apache/gora/redis/store/RedisStore.java#L299
>
> Best,
> John
>
> El sáb, 20 feb 2021 a las 3:01, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #11
>> Week 11: February, 14 - February, 20
>> Activities:
>> - Added scaling_factor support [1]
>> - Removed unsupported Elasticsearch data types [2]
>> - Implemented Metadata Analyzer for Elasticsearch Store [3]
>> - Tried to fix range query by “_id” field [4]
>> - Wrote documentation for Apache Gora website [5]
>> - Polished and sent my CV for reviewing
>>
>> Question:
>>
>>1. I tried to fix the issue, where Elasticsearch "_id" field does not
>>support range queries. I've tried treating "_id" as a number, but one of
>>the test "_id" field values is "http://foo.com/;. So my approach did
>>not work, but I decided to commit[4] my work on this issue in order to 
>> show
>>you what I tried to do.
>>
>>
>> [1]
>> https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
>> [2]
>> https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
>> [3]
>> https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
>> [4]
>> https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
>> [5]
>> https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing
>>
>> Regards,
>> Maria
>>
>


Add datastore for Elasticsearch. Outreachy Week 11 Report

2021-02-20 Thread Maria Podorvanova
Hi,

Report #11
Week 11: February, 14 - February, 20
Activities:
- Added scaling_factor support [1]
- Removed unsupported Elasticsearch data types [2]
- Implemented Metadata Analyzer for Elasticsearch Store [3]
- Tried to fix range query by “_id” field [4]
- Wrote documentation for Apache Gora website [5]
- Polished and sent my CV for reviewing

Question:

   1. I tried to fix the issue, where Elasticsearch "_id" field does not
   support range queries. I've tried treating "_id" as a number, but one of
   the test "_id" field values is "http://foo.com/;. So my approach did not
   work, but I decided to commit[4] my work on this issue in order to show you
   what I tried to do.


[1]
https://github.com/apache/gora/commit/670a04c51f4a6d169df319ed5fd3d1d0abd81870
[2]
https://github.com/apache/gora/commit/55020d722f9424021fefe8d94b6bf3ece213226d
[3]
https://github.com/apache/gora/commit/c491a6447d197b0509473294ee844834b1623a63
[4]
https://github.com/apache/gora/commit/a870ca8a2075af7cbab75b9341a94de4966fbf7a
[5]
https://docs.google.com/document/d/1AF6MG3pqe6A5Z0KtLooEKQlipuyeObbYlAa4O7rFXqM/edit?usp=sharing

Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 10 Report

2021-02-15 Thread Maria Podorvanova
Hi John,

Thank you for your answers.

1) The type of the Elasticsearch "_id" field is string. I am not sure that
will fix the problem if I just copy the "_id" field contents as "_id" can
still be an arbitrary string value (i.e. not necessarily an integer).

2) Elasticsearch does not support partitioning, so I will leave the single
partition implementation.

Regards,
Maria

On Tue, 16 Feb 2021 at 09:14, John Mora  wrote:

> Hi Maria,
>
> Thanks for the update.
>
> 1) I think you can copy the content from _id to a manually created field
> let's say 'gora_id' using copy_to.
>
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
>
> But, I have not try it yet, I am not sure if this will work.
>
> Alternatively, you can manually copy the value of the key to a field that
> can be range queried in the put method of the datastore.
>
> 2) In some databases you can split your data into partitions, generally
> defining ranges for the primary key.
>
> Kudu is an example of this:
> https://kudu.apache.org/docs/schema_design.html#range-partitioning
>
> In this case, the getPartitions should split a query using the existing
> partition ranges:
> Kudu example:
>
> https://github.com/apache/gora/blob/master/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L383
>
> If the database does not support partitioning this method only return a
> single partition (the whole table/collection).
> This is probably the implementation that you saw.
>
> I think Elasticsearch does not support partitioning, in that case your
> implementation is fine, but I am not an expert in Elasticsearch.
>
> Best,
> John
>
> El sáb, 13 feb 2021 a las 0:15, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #10
>> Week 10: January, 7 - February, 13
>> Activities:
>> - Implemented newQuery method
>> - Implemented deleteByQuery method
>> - Used an Enum instead of literal strings for the Authentication Type
>> parameter
>> - Used parameterized logging instead of string concatenation
>> - Implemented execute method
>> - Implemented getPartitions method
>> - The following tests are passing now:
>>
>>1. testTruncateSchema
>>2. testDeleteSchema
>>3. testQueryWebPageQueryEmptyResults
>>4. testResultSize
>>5. testResultSizeStartKey
>>6. testResultSizeEndKey
>>7. testResultSizeWithLimit
>>8. testResultSizeStartKeyWithLimit
>>9. testResultSizeEndKeyWithLimit
>>10. testResultSizeKeyRangeWithLimit
>>
>> - Filled out and sent Outreachy internship feedback to Apache
>>
>> Here is the link to my code:
>> https://github.com/apache/gora/compare/master...podorvanova:gora-664.
>> Relevant commits are from February 10.
>>
>> Questions:
>>
>>1. This week I worked on query functionalities implementation. While
>>testing I found that Elasticsearch "_id" field does not support range
>>queries, which are required for deleteByQuery method. So I am a little
>>confused about what I should do in this case.
>>2. I roughly understand that getPartitions method is needed to
>>implement the Hadoop support. I looked through other modules and found 
>> that
>>the method is implemented the same way everywhere, so I did the same for
>>now. Could you tell me more about this method or maybe provide some
>>resources?
>>
>>
>> Regards,
>> Maria
>>
>


Add datastore for Elasticsearch. Outreachy Week 10 Report

2021-02-12 Thread Maria Podorvanova
Hi,

Report #10
Week 10: January, 7 - February, 13
Activities:
- Implemented newQuery method
- Implemented deleteByQuery method
- Used an Enum instead of literal strings for the Authentication Type
parameter
- Used parameterized logging instead of string concatenation
- Implemented execute method
- Implemented getPartitions method
- The following tests are passing now:

   1. testTruncateSchema
   2. testDeleteSchema
   3. testQueryWebPageQueryEmptyResults
   4. testResultSize
   5. testResultSizeStartKey
   6. testResultSizeEndKey
   7. testResultSizeWithLimit
   8. testResultSizeStartKeyWithLimit
   9. testResultSizeEndKeyWithLimit
   10. testResultSizeKeyRangeWithLimit

- Filled out and sent Outreachy internship feedback to Apache

Here is the link to my code:
https://github.com/apache/gora/compare/master...podorvanova:gora-664.
Relevant commits are from February 10.

Questions:

   1. This week I worked on query functionalities implementation. While
   testing I found that Elasticsearch "_id" field does not support range
   queries, which are required for deleteByQuery method. So I am a little
   confused about what I should do in this case.
   2. I roughly understand that getPartitions method is needed to implement
   the Hadoop support. I looked through other modules and found that the
   method is implemented the same way everywhere, so I did the same for now.
   Could you tell me more about this method or maybe provide some resources?


Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 9 Report

2021-02-11 Thread Maria Podorvanova
Hi John,

Thank you for the feedback. I will work on your comments.

Regards,
Maria

On Thu, 11 Feb 2021 at 03:10, John Mora  wrote:

> Hi Maria
>
> Thanks for the update.
>
> Some comments:
>
> Please use an Enum instead of literal strings for the Authentication
> Method parameter.
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L126
>
> Use Parameterized logging instead of string concatenation
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMappingBuilder.java#L186
>
>
> Use the 'scalingFactor' attribute in the XML parsing, schema creation,
> XSD, etc.
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMappingBuilder.java#L200
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/Field.java#L133
>
> regards,
> John
>
> El vie, 5 feb 2021 a las 22:45, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #9
>> Period: January, 31 - February, 6
>> Activities:
>> - Fixed NPE when getting a non-existing Elasticsearch document
>> - Implemented serialization/deserialization for MAP Avro data type
>> - Refactored serialization/deserialization to have better javadocs and
>> arguments
>> - Implemented serialization/deserialization for RECORD Avro data type
>> - Implemented serialization/deserialization for UNION Avro data type
>> - Fixed passed Schema argument for ARRAY deserialization
>> - Fixed BYTES deserialization for Base64 encoded String
>> - Ignored testGet3UnionField test
>> - Added javadoc descriptions to serialization and deserialization methods
>> - The following tests are passing now:
>>
>>1. testPutNested
>>2. testPutArray
>>3. testPutBytes
>>4. tetsPutMap
>>5. testPutMixedMaps
>>6. testUpdate
>>7. testGetRecursive
>>8. testGetDoubleRecursive
>>9. testGetNested
>>10. testGetWithFields
>>11. testGetWebPage
>>12. testGetWebPageDefaultFields
>>13. testGetNonExisting
>>
>> - Wrote a blog post #5
>>
>> Here is the link to my code:
>> https://github.com/apache/gora/compare/master...podorvanova:gora-664.
>> Relevant commits are from February 4.
>>
>> Regards,
>> Maria
>>
>>


Add datastore for Elasticsearch. Outreachy Week 9 Report

2021-02-05 Thread Maria Podorvanova
Hi,

Report #9
Period: January, 31 - February, 6
Activities:
- Fixed NPE when getting a non-existing Elasticsearch document
- Implemented serialization/deserialization for MAP Avro data type
- Refactored serialization/deserialization to have better javadocs and
arguments
- Implemented serialization/deserialization for RECORD Avro data type
- Implemented serialization/deserialization for UNION Avro data type
- Fixed passed Schema argument for ARRAY deserialization
- Fixed BYTES deserialization for Base64 encoded String
- Ignored testGet3UnionField test
- Added javadoc descriptions to serialization and deserialization methods
- The following tests are passing now:

   1. testPutNested
   2. testPutArray
   3. testPutBytes
   4. tetsPutMap
   5. testPutMixedMaps
   6. testUpdate
   7. testGetRecursive
   8. testGetDoubleRecursive
   9. testGetNested
   10. testGetWithFields
   11. testGetWebPage
   12. testGetWebPageDefaultFields
   13. testGetNonExisting

- Wrote a blog post #5

Here is the link to my code:
https://github.com/apache/gora/compare/master...podorvanova:gora-664.
Relevant commits are from February 4.

Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 8 Report

2021-01-31 Thread Maria Podorvanova
Hi John,

Thank you for your feedback.

I am back from my small holiday and will address your comment this week.

Regards,
Maria

On Thu, 28 Jan 2021 at 03:41, John Mora  wrote:

> Hi María.
>
> Thanks for your report.
>
> Some comments.
>
> When a Key is not found in the Datastore the get method should return
>  null, but currently a NullPointerException is thrown here:
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L256
>
> According to the avro documentation: "This data type is used to declare a
> fixed-sized field that can be used for storing binary data.", it is similar
> to the BYTES data type, but it has a size attribute.
>
> If Elasticsearch has a fixed-sized binary datatype you can use it. But, I
> think that is not the case, so If other modules do not implement it, it is
> fine to leave it that way.
>
> Best,
> John
>
> El dom, 24 ene 2021 a las 8:16, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #8
>> Period: January, 17 - January, 23
>> Activities:
>> - Fix createSchema method [1]
>>
>>1. Added the index mappings while creating the Elasticsearch index
>>2. Added getter and setter to enum Datatype
>>
>> - Implement serialization/deserialization for some Avro data types [2]
>>
>>1. Implemented serializeFieldValue and deserializeFieldValue methods
>>for ARRAY BOOLEAN, BYTES and FIXED Avro data types
>>2. Fixed deserialization for STRING Avro data type
>>3. Added javadoc descriptions
>>
>> - The following tests are passing now:
>>
>>1. testCreateSchema, testAutoCreateSchema for createSchema method
>>2. testSchemaExists
>>3. testPut
>>4. testGet
>>
>> - Wrote a blog post #4
>>
>> Here are links to the commits:
>> [1]
>> https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941
>> [2]
>> https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e
>>
>> Question:
>> I didn't quite understand what kind of type is represented by FIXED Avro
>> data type. I looked through other modules and found that FIXED case is not
>> being processed neither in serialization nor in deserialization, so I did
>> the same for now.
>>
>> Regards,
>> Maria
>>
>


Add datastore for Elasticsearch. Outreachy Week 8 Report

2021-01-24 Thread Maria Podorvanova
Hi,

Report #8
Period: January, 17 - January, 23
Activities:
- Fix createSchema method [1]

   1. Added the index mappings while creating the Elasticsearch index
   2. Added getter and setter to enum Datatype

- Implement serialization/deserialization for some Avro data types [2]

   1. Implemented serializeFieldValue and deserializeFieldValue methods for
   ARRAY BOOLEAN, BYTES and FIXED Avro data types
   2. Fixed deserialization for STRING Avro data type
   3. Added javadoc descriptions

- The following tests are passing now:

   1. testCreateSchema, testAutoCreateSchema for createSchema method
   2. testSchemaExists
   3. testPut
   4. testGet

- Wrote a blog post #4

Here are links to the commits:
[1]
https://github.com/podorvanova/gora/commit/6b9c21095fa4e9327328ec881b659c60c58c4941
[2]
https://github.com/podorvanova/gora/commit/e459309b3f750af65a181d4904470eaee9c29a2e

Question:
I didn't quite understand what kind of type is represented by FIXED Avro
data type. I looked through other modules and found that FIXED case is not
being processed neither in serialization nor in deserialization, so I did
the same for now.

Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-21 Thread Maria Podorvanova
Hi,

Okay, I will do that then. Thanks.

Regards,
Maria

On Thu, 21 Jan 2021 at 03:33, John Mora  wrote:

> Hi Maria,
>
> Sorry for the late reply. Let's keep it simple.You can throw an exception
> when you receive a STRING and only process RECORD cases in UNION.
>
> Example:
>
> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L349
>
> Regards,
> John
>
> El mar, 19 ene 2021 a las 4:49, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi
>>
>> Thank you for your comments.
>>
>> I will take a look into your links, but my question was a bit different.
>> The problem is that foreign key "boss" is represented in Avro as UNION of
>> three types: STRING, NULL and RECORD. Your answer is in regards to how to
>> handle the last case (RECORD), but I was asking about how to handle
>> the STRING case. AFAIU STRING refers to the Employee's primary key type, so
>> that you could write "boss: '123'" instead of specifying the whole object.
>> Should I be making an additional GET request for this case?
>>
>> Regards,
>> Maria
>>
>> On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:
>>
>>> Hi Maria,
>>>
>>> Thanks for the update.
>>>
>>> Some comments:
>>>
>>>
>>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>>>
>>> Please add the index mappings when you create the elasticsearch index.
>>>
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>>>
>>> You can use the Field mappings parsed from the XML file.
>>>
>>>
>>> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>>>
>>> Regarding your question, Elasticsearch supports complex datatypes:
>>>
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>>>
>>> You can use the RethinkDB datastore as an example and store recursively
>>> the fields of the embedded objects.
>>>
>>>
>>> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>>>
>>> Give it a try first and let me know if you get stuck.
>>>
>>> Alternatively, if the first option is not feasible, you can serialize
>>> the embedded objects as byte array, example:
>>>
>>>
>>> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
>>>
>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>>>
>>> Best regards,
>>> John.
>>>
>>> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
>>> podorvanova.ma...@gmail.com>) escribió:
>>>
>>>> Hi,
>>>>
>>>> Report #7
>>>> Period: January 10 - January 16
>>>> Activities:
>>>> - Fixed authentication [1]:
>>>>
>>>>1. Set up password to Elasticsearch container properly
>>>>2. Set default Elasticsearch container server’s username in
>>>>gora.properties
>>>>3. Added exceptions for missing arguments in authentication
>>>>
>>>> - Added a parameter for the XSD validation [2]:
>>>>
>>>>1. Defined a parameter for the XSD validation
>>>>2. Added a test case for the parameter
>>>>3. Made ElasticsearchStore read mapping file from properties, not
>>>>configuration
>>>>
>>>> - Implemented some basic Input-Output operations for schema management
>>>> [3]:
>>>>
>>>>1. Implemented delete, get and put methods
>>>>2. Implemented newInstance and getUnionSchema utility methods
>>>>3. Implemented basic serialization/deserialization for primitive
>>>>AVRO types
>>>>
>>>>
>>>> Here are links to the commits:
>>>> [1]
>>>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
>>>> [2]
>>>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
>>>> [3]
>>>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>>>>
>>>> This week I have started work on serialization/deserialization. While
>>>> testing get method I found that UNION case could be a combination of NULL,
>>>> STRING or another RECORD for external table references (e.g. boss for
>>>> Employee). Could you explain to me what I should do in this case? I see two
>>>> possible cases here: 1) Do deserialize recursively if the field value is a
>>>> RECORD 2) Make another request for STRING case, where I have only key for
>>>> the external object.
>>>>
>>>> Regards,
>>>> Maria
>>>>
>>>


Re: Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-19 Thread Maria Podorvanova
Hi

Thank you for your comments.

I will take a look into your links, but my question was a bit different.
The problem is that foreign key "boss" is represented in Avro as UNION of
three types: STRING, NULL and RECORD. Your answer is in regards to how to
handle the last case (RECORD), but I was asking about how to handle
the STRING case. AFAIU STRING refers to the Employee's primary key type, so
that you could write "boss: '123'" instead of specifying the whole object.
Should I be making an additional GET request for this case?

Regards,
Maria

On Tue, 19 Jan 2021 at 08:53, John Mora  wrote:

> Hi Maria,
>
> Thanks for the update.
>
> Some comments:
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192
>
> Please add the index mappings when you create the elasticsearch index.
>
>
> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings
>
> You can use the Field mappings parsed from the XML file.
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28
>
> Regarding your question, Elasticsearch supports complex datatypes:
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
> https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
>
> You can use the RethinkDB datastore as an example and store recursively
> the fields of the embedded objects.
>
>
> https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448
>
> Give it a try first and let me know if you get stuck.
>
> Alternatively, if the first option is not feasible, you can serialize the
> embedded objects as byte array, example:
>
>
> https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
> https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html
>
> Best regards,
> John.
>
> El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #7
>> Period: January 10 - January 16
>> Activities:
>> - Fixed authentication [1]:
>>
>>1. Set up password to Elasticsearch container properly
>>2. Set default Elasticsearch container server’s username in
>>gora.properties
>>3. Added exceptions for missing arguments in authentication
>>
>> - Added a parameter for the XSD validation [2]:
>>
>>1. Defined a parameter for the XSD validation
>>2. Added a test case for the parameter
>>3. Made ElasticsearchStore read mapping file from properties, not
>>configuration
>>
>> - Implemented some basic Input-Output operations for schema management
>> [3]:
>>
>>1. Implemented delete, get and put methods
>>2. Implemented newInstance and getUnionSchema utility methods
>>3. Implemented basic serialization/deserialization for primitive AVRO
>>types
>>
>>
>> Here are links to the commits:
>> [1]
>> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
>> [2]
>> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
>> [3]
>> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>>
>> This week I have started work on serialization/deserialization. While
>> testing get method I found that UNION case could be a combination of NULL,
>> STRING or another RECORD for external table references (e.g. boss for
>> Employee). Could you explain to me what I should do in this case? I see two
>> possible cases here: 1) Do deserialize recursively if the field value is a
>> RECORD 2) Make another request for STRING case, where I have only key for
>> the external object.
>>
>> Regards,
>> Maria
>>
>


Add datastore for Elasticsearch. Outreachy Week 7 Report

2021-01-16 Thread Maria Podorvanova
Hi,

Report #7
Period: January 10 - January 16
Activities:
- Fixed authentication [1]:

   1. Set up password to Elasticsearch container properly
   2. Set default Elasticsearch container server’s username in
   gora.properties
   3. Added exceptions for missing arguments in authentication

- Added a parameter for the XSD validation [2]:

   1. Defined a parameter for the XSD validation
   2. Added a test case for the parameter
   3. Made ElasticsearchStore read mapping file from properties, not
   configuration

- Implemented some basic Input-Output operations for schema management [3]:

   1. Implemented delete, get and put methods
   2. Implemented newInstance and getUnionSchema utility methods
   3. Implemented basic serialization/deserialization for primitive AVRO
   types


Here are links to the commits:
[1]
https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
[2]
https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
[3]
https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0

This week I have started work on serialization/deserialization. While
testing get method I found that UNION case could be a combination of NULL,
STRING or another RECORD for external table references (e.g. boss for
Employee). Could you explain to me what I should do in this case? I see two
possible cases here: 1) Do deserialize recursively if the field value is a
RECORD 2) Make another request for STRING case, where I have only key for
the external object.

Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 6 Report

2021-01-10 Thread Maria Podorvanova
Hi,

Thank you for your feedback.

I will define the parameter for the XSD validation and fix authentication.

I submitted my midpoint feedback last week.

Regards,
Maria

On Mon, 11 Jan 2021 at 11:38, John Mora  wrote:

> Hi.
>
> Thanks for the update.
>
> Some comments:
>
> Please define a parameter for the XSD validation ( gora.xsd_validation, 
> default:
> false )
> Example:
> https://github.com/apache/gora/blob/master/gora-lucene/src/main/java/org/apache/gora/lucene/store/LuceneStore.java#L131
>
> I do not think the Elasticsearch authentication is working. Here, you set
> the username and password to : "gora.datastore.elasticsearch.username" and
> "gora.datastore.elasticsearch.password" (literal values) in the container.
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/test/java/org/apache/gora/elasticsearch/GoraElasticsearchTestDriver.java#L41
>
> But, in the properties file the values are "username" and "password".
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/test/resources/gora.properties#L24
>
> Please do not forget to submit your midpoint feedback. It is due Jan. 12
> 4pm UTC.
>
> Cheers,
> John
>
> El sáb, 9 ene 2021 a las 5:21, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #6
>> Period: January 3 - January 9
>> Activities:
>> - Added XSD validation file for the XML mapping [1]
>> - Fixed XSD validation [2]
>>
>>1. Relocated gora-elasticsearch.xsd file to main resources
>>2. Covered XSD validation with test
>>3. Added gora-elasticsearch-mapping-invalid.xml file for test
>>
>> - Set up Elasticsearch container's authentication parameters [3]
>> - Implemented exists method [4]
>> - Added comments for the connection parameters [5]
>>
>> Here are links to the commits:
>> [1]
>> https://github.com/apache/gora/commit/5020efc9aaf80c7e585b446bec3be392969093ba
>>
>> [2]
>> https://github.com/apache/gora/commit/05e8ed5ebddfbbddb175a683d0242a76896e6bb5
>>
>> [3]
>> https://github.com/apache/gora/commit/66c9d80ee255df5c3b397aa5b0e4e1256598ebb5
>>
>> [4]
>> https://github.com/apache/gora/commit/2f7d7218d7c120ad4694916186debdab3beac4f7
>>
>> [5]
>> https://github.com/apache/gora/commit/e2d0aeba8db9fd65e6e127d08196e2facac52c05
>>
>>
>> Regards,
>> Maria
>>
>


Add datastore for Elasticsearch. Outreachy Week 6 Report

2021-01-09 Thread Maria Podorvanova
Hi,

Report #6
Period: January 3 - January 9
Activities:
- Added XSD validation file for the XML mapping [1]
- Fixed XSD validation [2]

   1. Relocated gora-elasticsearch.xsd file to main resources
   2. Covered XSD validation with test
   3. Added gora-elasticsearch-mapping-invalid.xml file for test

- Set up Elasticsearch container's authentication parameters [3]
- Implemented exists method [4]
- Added comments for the connection parameters [5]

Here are links to the commits:
[1]
https://github.com/apache/gora/commit/5020efc9aaf80c7e585b446bec3be392969093ba

[2]
https://github.com/apache/gora/commit/05e8ed5ebddfbbddb175a683d0242a76896e6bb5

[3]
https://github.com/apache/gora/commit/66c9d80ee255df5c3b397aa5b0e4e1256598ebb5

[4]
https://github.com/apache/gora/commit/2f7d7218d7c120ad4694916186debdab3beac4f7

[5]
https://github.com/apache/gora/commit/e2d0aeba8db9fd65e6e127d08196e2facac52c05


Regards,
Maria


Re: Add datastore for Elasticsearch. Outreachy Week 5 Report

2021-01-06 Thread Maria Podorvanova
Hi,

Thank you for your feedback.

Yes, I did not check that XSD validation was working. I will fix it and
also add a test for it.

I will add comments for the connection parameters.

Yes, it seems you are right that Docker container has no authentication
parameters, because I did not set them up. I will look into it too.

Regards,
Maria



On Thu, 7 Jan 2021 at 05:15, John Mora  wrote:

> Hi,
>
> Thanks for your hard work.
>
> Some comments:
>
> The XSD validation is not working:
>
> org.apache.gora.util.GoraException: java.lang.RuntimeException:
> org.xml.sax.SAXParseException; schema_reference.4: Failed to read schema
> document 'file:/tmp/gora/gora-elasticsearch/gora-elasticsearch.xsd',
> because 1) could not find the document; 2)...
>
> The XSD file should be located in the main source code instead of the
> tests, you can also add a test for the XSD validation.
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/test/resources/gora-elasticsearch.xsd
>
> Please add some comments with information about the connection parameters
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/utils/ElasticsearchParameters.java#L29
>
> I think the security is not set up in the Docker container and the
> parameters gora.datastore.elasticsearch.username and
> gora.datastore.elasticsearch.password  are simply ignored by the
> Elasticsearch server. Is that right?
>
>
> https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/test/java/org/apache/gora/elasticsearch/GoraElasticsearchTestDriver.java
>
> Cheers,
> John
>
> El sáb, 2 ene 2021 a las 4:52, Maria Podorvanova (<
> podorvanova.ma...@gmail.com>) escribió:
>
>> Hi,
>>
>> Report #5
>> Period: December 27 - January 2
>> Activities:
>> - Added a property for choosing the authentication method [1]
>> - Implemented testing with Elasticsearch container [2]
>>
>>1. Researched testing side
>>2. Added test dependencies
>>3. Added GoraElasticsearchTestDriver with Elasticsearch container
>>4. Added javadoc descriptions to GoraElasticsearchTestDriver class
>>5. Fixed two existing tests in accordance to Elasticsearch container
>>
>> - Implement some methods for schema management [3]
>>
>>1. schemaExists
>>2. createSchema
>>3. deleteSchema
>>4. flush
>>
>>
>> Here are links to the commits:
>> [1]
>> https://github.com/apache/gora/commit/457f57a2391856e6d6a5d67c2668c6f28348d40d
>>
>> [2]
>> https://github.com/apache/gora/commit/3d0784721fc8bf158522a6b5dc6e309aae27a2de
>>
>> [3]
>> https://github.com/apache/gora/commit/57da5033ac26a2b31046c83bdfe8729b1aeb6889
>>
>>
>> Regards,
>> Maria
>>
>


Add datastore for Elasticsearch. Outreachy Week 5 Report

2021-01-02 Thread Maria Podorvanova
Hi,

Report #5
Period: December 27 - January 2
Activities:
- Added a property for choosing the authentication method [1]
- Implemented testing with Elasticsearch container [2]

   1. Researched testing side
   2. Added test dependencies
   3. Added GoraElasticsearchTestDriver with Elasticsearch container
   4. Added javadoc descriptions to GoraElasticsearchTestDriver class
   5. Fixed two existing tests in accordance to Elasticsearch container

- Implement some methods for schema management [3]

   1. schemaExists
   2. createSchema
   3. deleteSchema
   4. flush


Here are links to the commits:
[1]
https://github.com/apache/gora/commit/457f57a2391856e6d6a5d67c2668c6f28348d40d

[2]
https://github.com/apache/gora/commit/3d0784721fc8bf158522a6b5dc6e309aae27a2de

[3]
https://github.com/apache/gora/commit/57da5033ac26a2b31046c83bdfe8729b1aeb6889


Regards,
Maria


Project ideas for the Elasticsearch datastore

2020-12-20 Thread Maria Podorvanova
Hi all,

I created a Google Doc with my project ideas implementation for the
Elasticsearch datastore, the link you can find here
.
I will be happy with any feedback and support on this project.

Regards,
Maria