[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1218 This change was reverted [here](https://github.com/apache/metron/commit/0e037edad913955d3b6754ca9cf42b329cd84160). A new pull request will be opened with the functionality. See also [this mailing list thread](https://github.com/apache/metron/commit/0e037edad913955d3b6754ca9cf42b329cd84160). ---
[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user mraliagha commented on the issue: https://github.com/apache/metron/pull/1218 > @mraliagha, that's a good suggestion. I believe we can functionally achieve that be creating a custom id field in the format you suggest (with a Stellar field transform) and set that field to be the ES id with the Ambari property exposed in this PR. Do you feel it's worth documenting as an optimization? Yes, I think it is worth documenting as people can easily create serious issues with Lucene based indexers by messing with ID. It can give users an understanding of where it is safe to play with the ID and what the recommendations are. I see if I can find any articles to share it as a part of the manual. ---
[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1218 Looks good to me. +1 ---
[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1218 @mraliagha, that's a good suggestion. I believe we can functionally achieve that be creating a custom id field in the format you suggest (with a Stellar field transform) and set that field to be the ES id with the Ambari property exposed in this PR. Do you feel it's worth documenting as an optimization? I spun this up in full dev and ran through all the testing instructions. Everything worked as advertised. I think there are just a couple open questions but this is pretty close in my opinion. ---
[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user mraliagha commented on the issue: https://github.com/apache/metron/pull/1218 @nickwallen in the case of event logs and the fact that retrieval segmentation would be mostly based on timestamp, it is recommended to use timestamp as a prefix of the id. For example, something like timestamp+hash(original_string). ---
[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1218 @mraliagha I updated the README to (hopefully) better explain your options in using `es.document.id`. I sensed by your question that what I had originally was not very clear. ---
[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1218 > @mraliagha: So if es.document.id is not provided, as the default, doc id won't be send to ES indexing, right? Yes, exactly. > @mraliagha: I guess it would be also nice to provide some guidance on how document ID should be defined (in the case of custom ID). Otherwise, users may create some serious issues with the indexing and search throughput. I am just providing the **capability** for advanced users to define their own doc ID, primarily based on your feedback in METRON-1677. (It also provides a nice way to support backwards compatibility, which is the main reason that I took this approach.) If you have any advice to offer, feel free to offer it and we can include it in the docs. Other than that, I am not sure what I can do besides add a big, bold warning to the docs that says create your own doc ID at your own risk. ---
[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...
Github user mraliagha commented on the issue: https://github.com/apache/metron/pull/1218 Thanks, Nick. So if es.document.id is not provided, as the default, doc id won't be send to ES indexing, right? I guess it would be also nice to provide some guidance on how document ID should be defined (in the case of custom ID). Otherwise, users may create some serious issues with the indexing and search throughput. ---