[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-24 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1218
  
This change was reverted 
[here](https://github.com/apache/metron/commit/0e037edad913955d3b6754ca9cf42b329cd84160).
  A new pull request will be opened with the functionality. See also [this 
mailing list 
thread](https://github.com/apache/metron/commit/0e037edad913955d3b6754ca9cf42b329cd84160).


---


[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-11 Thread mraliagha
Github user mraliagha commented on the issue:

https://github.com/apache/metron/pull/1218
  
> @mraliagha, that's a good suggestion. I believe we can functionally 
achieve that be creating a custom id field in the format you suggest (with a 
Stellar field transform) and set that field to be the ES id with the Ambari 
property exposed in this PR. Do you feel it's worth documenting as an 
optimization?

Yes, I think it is worth documenting as people can easily create serious 
issues with Lucene based indexers by messing with ID. It can give users an 
understanding of where it is safe to play with the ID and what the 
recommendations are. I see if I can find any articles to share it as a part of 
the manual.


---


[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-11 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1218
  
Looks good to me.  +1


---


[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-11 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1218
  
@mraliagha, that's a good suggestion.  I believe we can functionally 
achieve that be creating a custom id field in the format you suggest (with a 
Stellar field transform) and set that field to be the ES id with the Ambari 
property exposed in this PR.  Do you feel it's worth documenting as an 
optimization?

I spun this up in full dev and ran through all the testing instructions.  
Everything worked as advertised.  I think there are just a couple open 
questions but this is pretty close in my opinion.


---


[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-08 Thread mraliagha
Github user mraliagha commented on the issue:

https://github.com/apache/metron/pull/1218
  
@nickwallen in the case of event logs and the fact that retrieval 
segmentation would be mostly based on timestamp, it is recommended to use 
timestamp as a prefix of the id. For example, something like 
timestamp+hash(original_string).


---


[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-08 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1218
  
@mraliagha I updated the README to (hopefully) better explain your options 
in using `es.document.id`.  I sensed by your question that what I had 
originally was not very clear.


---


[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-08 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1218
  
> @mraliagha: So if es.document.id is not provided, as the default, doc id 
won't be send to ES indexing, right?

Yes, exactly.  

> @mraliagha: I guess it would be also nice to provide some guidance on how 
document ID should be defined (in the case of custom ID). Otherwise, users may 
create some serious issues with the indexing and search throughput.

I am just providing the **capability** for advanced users to define their 
own doc ID, primarily based on your feedback in METRON-1677.  (It also provides 
a nice way to support backwards compatibility, which is the main reason that I 
took this approach.) 

If you have any advice to offer, feel free to offer it and we can include 
it in the docs.  Other than that, I am not sure what I can do besides add a 
big, bold warning to the docs that says create your own doc ID at your own risk.



---


[GitHub] metron issue #1218: METRON-1801 Allow Customization of Elasticsearch Documen...

2018-10-07 Thread mraliagha
Github user mraliagha commented on the issue:

https://github.com/apache/metron/pull/1218
  
Thanks, Nick. So if es.document.id is not provided, as the default, doc id 
won't be send to ES indexing, right? I guess it would be also nice to provide 
some guidance on how document ID should be defined (in the case of custom ID). 
Otherwise, users may create some serious issues with the indexing and search 
throughput. 


---