[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-11-16 Thread Nick Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690028#comment-16690028
 ] 

Nick Allen commented on METRON-1801:


This approach was abandoned.  See 
https://issues.apache.org/jira/browse/METRON-1879.

> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662230#comment-16662230
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1218
  
This change was reverted 
[here](https://github.com/apache/metron/commit/0e037edad913955d3b6754ca9cf42b329cd84160).
  A new pull request will be opened with the functionality. See also [this 
mailing list 
thread](https://github.com/apache/metron/commit/0e037edad913955d3b6754ca9cf42b329cd84160).


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647194#comment-16647194
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user mraliagha commented on the issue:

https://github.com/apache/metron/pull/1218
  
> @mraliagha, that's a good suggestion. I believe we can functionally 
achieve that be creating a custom id field in the format you suggest (with a 
Stellar field transform) and set that field to be the ES id with the Ambari 
property exposed in this PR. Do you feel it's worth documenting as an 
optimization?

Yes, I think it is worth documenting as people can easily create serious 
issues with Lucene based indexers by messing with ID. It can give users an 
understanding of where it is safe to play with the ID and what the 
recommendations are. I see if I can find any articles to share it as a part of 
the manual.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647002#comment-16647002
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/1218


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646840#comment-16646840
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1218
  
Looks good to me.  +1


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646839#comment-16646839
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user merrimanr commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r224548764
  
--- Diff: metron-platform/metron-elasticsearch/pom.xml ---
@@ -206,24 +206,7 @@
 test-jar
 test
 
-
--- End diff --

I have also noticed this and I end up commenting these out every time I run 
a test in my IDE.  Thanks for investigating and removing them.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646828#comment-16646828
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r224544919
  
--- Diff: 
metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriterConfig.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.metron.elasticsearch.writer;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.stellar.common.utils.ConversionUtils;
+
+import java.util.Map;
+
+/**
+ * Configuration settings that customize the behavior of the {@link 
ElasticsearchWriter}.
+ */
+public enum ElasticsearchWriterConfig {
+
+  /**
+   * Defines which message field, the document identifier is set to.
+   *
+   * If defined, the value of the specified message field is set as the 
Elasticsearch doc ID. If
+   * this field is undefined or blank, then the document identifier is not 
set.
+   */
+  DOC_ID_SOURCE_FIELD("es.document.id", "", String.class);
--- End diff --

The last commit removes `ElasticsearchWriterConfig`.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646825#comment-16646825
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r224544049
  
--- Diff: metron-platform/metron-elasticsearch/pom.xml ---
@@ -206,24 +206,7 @@
 test-jar
 test
 
-
--- End diff --

They cause the tests to fail when executed in an IDE like IntelliJ.  I 
don't understand exactly why, but @justinleet pointed me in this direction.

Also, everything runs just fine without them, so they are unnecessary.  The 
fewer dependencies, the better.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646816#comment-16646816
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user merrimanr commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r224541191
  
--- Diff: metron-platform/metron-elasticsearch/pom.xml ---
@@ -206,24 +206,7 @@
 test-jar
 test
 
-
--- End diff --

Why were these dependencies removed?  Just curious.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646810#comment-16646810
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r224539451
  
--- Diff: 
metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriterConfig.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.metron.elasticsearch.writer;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.stellar.common.utils.ConversionUtils;
+
+import java.util.Map;
+
+/**
+ * Configuration settings that customize the behavior of the {@link 
ElasticsearchWriter}.
+ */
+public enum ElasticsearchWriterConfig {
+
+  /**
+   * Defines which message field, the document identifier is set to.
+   *
+   * If defined, the value of the specified message field is set as the 
Elasticsearch doc ID. If
+   * this field is undefined or blank, then the document identifier is not 
set.
+   */
+  DOC_ID_SOURCE_FIELD("es.document.id", "", String.class);
--- End diff --

 I didn't know we had `ConfigOption`.  I will remove 
`ElasticsearchWriterConfig` for now.  Then as a follow-on, I will do something 
similar, but reuse `ConfigOption`.  Thanks for the pointer!


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646812#comment-16646812
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1218
  
@mraliagha, that's a good suggestion.  I believe we can functionally 
achieve that be creating a custom id field in the format you suggest (with a 
Stellar field transform) and set that field to be the ES id with the Ambari 
property exposed in this PR.  Do you feel it's worth documenting as an 
optimization?

I spun this up in full dev and ran through all the testing instructions.  
Everything worked as advertised.  I think there are just a couple open 
questions but this is pretty close in my opinion.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646808#comment-16646808
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user merrimanr commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r224538137
  
--- Diff: 
metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriterConfig.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.metron.elasticsearch.writer;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.stellar.common.utils.ConversionUtils;
+
+import java.util.Map;
+
+/**
+ * Configuration settings that customize the behavior of the {@link 
ElasticsearchWriter}.
+ */
+public enum ElasticsearchWriterConfig {
+
+  /**
+   * Defines which message field, the document identifier is set to.
+   *
+   * If defined, the value of the specified message field is set as the 
Elasticsearch doc ID. If
+   * this field is undefined or blank, then the document identifier is not 
set.
+   */
+  DOC_ID_SOURCE_FIELD("es.document.id", "", String.class);
--- End diff --

This does leave things in an inconsistent state and will makes things 
confusing for anyone working with these classes.  However I think a follow up 
PR would be fine.

Are you aware of the 
[ConfigOption](https://github.com/apache/metron/blob/master/metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/ConfigOption.java)
 interface?  It looks like there may be some duplication with this class.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642753#comment-16642753
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user mraliagha commented on the issue:

https://github.com/apache/metron/pull/1218
  
@nickwallen in the case of event logs and the fact that retrieval 
segmentation would be mostly based on timestamp, it is recommended to use 
timestamp as a prefix of the id. For example, something like 
timestamp+hash(original_string).


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642561#comment-16642561
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r223514656
  
--- Diff: 
metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/writer/ElasticsearchWriterConfig.java
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.metron.elasticsearch.writer;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.stellar.common.utils.ConversionUtils;
+
+import java.util.Map;
+
+/**
+ * Configuration settings that customize the behavior of the {@link 
ElasticsearchWriter}.
+ */
+public enum ElasticsearchWriterConfig {
+
+  /**
+   * Defines which message field, the document identifier is set to.
+   *
+   * If defined, the value of the specified message field is set as the 
Elasticsearch doc ID. If
+   * this field is undefined or blank, then the document identifier is not 
set.
+   */
+  DOC_ID_SOURCE_FIELD("es.document.id", "", String.class);
--- End diff --

I put this value here because I didn't see any other place for it at the 
time.  I then later found the other properties like `es.ip` defined in 
`ElasticsearchUtils`.  

Many of the property definitions there are hard coded and could use a 
clean-up. I'd like to do a follow-on to add `es.ip`, `es.date.format`, 
`es.client.class`, etc to this configuration class so that each is clearly 
defined.

As part of this PR, we could keep this property here or I could move it to 
`ElasticsearchUtils` to match those other properties.  Open to reviewer 
feedback on this.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642559#comment-16642559
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r223513924
  
--- Diff: 
metron-platform/metron-solr/src/test/java/org/apache/metron/solr/integration/SolrSearchIntegrationTest.java
 ---
@@ -223,6 +225,23 @@ public void returns_column_data_for_multiple_indices() 
throws Exception {
 Assert.assertEquals(null, fieldTypes.get("fake.field"));
   }
 
+  @Test
+  public void queries_fields() throws Exception {
+SearchRequest request = JSONUtils.INSTANCE.load(fieldsQuery, 
SearchRequest.class);
+SearchResponse response = getIndexDao().search(request);
+Assert.assertEquals(10, response.getTotal());
+
+List results = response.getResults();
+Assert.assertEquals(10, response.getResults().size());
+
+// validate the source fields contained in the search response
+for (int i = 0; i < 10; ++i) {
+  Map source = results.get(i).getSource();
+  Assert.assertNotNull(source);
+  
Assert.assertNotNull(source.get(Constants.Fields.SRC_ADDR.getName()));
--- End diff --

Solr does not return the Metron GUID; unlike Elasticsearch.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642556#comment-16642556
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1218#discussion_r223513848
  
--- Diff: 
metron-platform/metron-elasticsearch/src/test/java/org/apache/metron/elasticsearch/integration/ElasticsearchSearchIntegrationTest.java
 ---
@@ -352,6 +350,24 @@ public void different_type_filter_query() throws 
Exception {
 Assert.assertEquals("data 1", results.get(0).getSource().get("ttl"));
   }
 
+  @Test
+  public void queries_fields() throws Exception {
+SearchRequest request = JSONUtils.INSTANCE.load(fieldsQuery, 
SearchRequest.class);
+SearchResponse response = getIndexDao().search(request);
+Assert.assertEquals(10, response.getTotal());
+
+List results = response.getResults();
+Assert.assertEquals(10, response.getResults().size());
+
+// validate the source fields contained in the search response
+for (int i = 0; i < 10; ++i) {
+  Map source = results.get(i).getSource();
+  Assert.assertNotNull(source);
+  
Assert.assertNotNull(source.get(Constants.Fields.SRC_ADDR.getName()));
+  Assert.assertNotNull(source.get(Constants.GUID));
--- End diff --

Elasticsearch must now always return the GUID to populate the UI.  We 
cannot rely on the document ID being the same as the Metron GUID.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642554#comment-16642554
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1218
  
@mraliagha I updated the README to (hopefully) better explain your options 
in using `es.document.id`.  I sensed by your question that what I had 
originally was not very clear.


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641863#comment-16641863
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1218
  
> @mraliagha: So if es.document.id is not provided, as the default, doc id 
won't be send to ES indexing, right?

Yes, exactly.  

> @mraliagha: I guess it would be also nice to provide some guidance on how 
document ID should be defined (in the case of custom ID). Otherwise, users may 
create some serious issues with the indexing and search throughput.

I am just providing the **capability** for advanced users to define their 
own doc ID, primarily based on your feedback in METRON-1677.  (It also provides 
a nice way to support backwards compatibility, which is the main reason that I 
took this approach.) 

If you have any advice to offer, feel free to offer it and we can include 
it in the docs.  Other than that, I am not sure what I can do besides add a 
big, bold warning to the docs that says create your own doc ID at your own risk.



> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641381#comment-16641381
 ] 

ASF GitHub Bot commented on METRON-1801:


Github user mraliagha commented on the issue:

https://github.com/apache/metron/pull/1218
  
Thanks, Nick. So if es.document.id is not provided, as the default, doc id 
won't be send to ES indexing, right? I guess it would be also nice to provide 
some guidance on how document ID should be defined (in the case of custom ID). 
Otherwise, users may create some serious issues with the indexing and search 
throughput. 


> Allow Customization of Elasticsearch Document ID
> 
>
> Key: METRON-1801
> URL: https://issues.apache.org/jira/browse/METRON-1801
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The user should be able to customize the document ID that is set by the 
> client when indexing documents into Elasticsearch.  The user should be able 
> to use the Metron GUID, define their own custom document ID, or choose to not 
> have the document ID set by the client.
>  
> This task covers Elasticsearch only.  An additional task should be created to 
> cover Solr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1801) Allow Customization of Elasticsearch Document ID

2018-10-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636167#comment-16636167
 ] 

ASF GitHub Bot commented on METRON-1801:


GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/1218

METRON-1801 Allow Customization of Elasticsearch Document ID

Currently, the Metron GUID is always used as the Elasticsearch document ID. 
As documented in 
[METRON-1677](https://issues.apache.org/jira/browse/METRON-1677), using a 
randomized UUID like Java's `UUID.randomUUID()` can negatively impact 
Elasticsearch performance.  This change allows a user to customize the 
identifier that is used by Elasticsearch when indexing documents.

We do this by allowing a user to specify the name of the message field 
whose value is set as the document ID.  The user can customize this by defining 
a global variable called `es.document.id`.  There are three usage scenarios 
that I see.

  * By default, Metron's GUID field will be used as the source of the 
document ID.  This ensures backwards compatible behavior. This is the behavior 
should the value be set as below or should the global variable not be set.
```
es.document.id = guid
```

  * If a user wants Elasticsearch to define its own document id, then 
`es.document.id` should be set to a blank value or empty string.  In this case, 
the document ID will not be set by the client and Elasticsearch will define its 
own.
```
es.document.id = 
```

  * If a user wants to set their own custom document ID, they should create 
an enrichment that defines a new message field like `my_document_id`.  They 
should then use this new field to set the Elasticsearch document ID.
```
es.document.id = my_document_id
```

## TODO

I have a few more loose ends to tie-up, but wanted to get a start on the 
test plan and description in case the community has early feedback to offer.

- [ ] Allow user to set the `es.document.id` value in the Mpack.
- [ ] Document this global settings and usage scenario in a README.
- [ ] More unit/integration tests might be needed.  Trying to determine 
where those need to go.
- [ ] Run the UI e2e tests to ensure they remain happy.
- [ ] Fix issue with Solr integration tests.

## Changes

  * The `ElasticsearchWriter` was updated to allow the document ID to be 
configurable.

  * A 'search by GUID' in the REST layer was implicitly using the document 
ID, whereas it should be using the Metron GUID.

* Search results should use the Metron GUID as the ID returned to the UI.  
All IDs visible to the user should always be the Metron GUID, not the document 
ID.

## Testing

1. Spin-up a development environment.  You may need to stop the PCAP and/or 
Profiler topology to free-up slots to allow indexing to occur.

```
cd metron-deployment/development/centos6
vagrant up
```

1. Ensure that alerts are visible in the Alerts UI.

1. Stop the indexing topologies using Ambari.

1. Login to the VM.

```
vagrant ssh
sudo su -
```

1. Delete the existing indices in Elasticsearch.

```
curl -XDELETE http://node1:9200/bro*
curl -XDELETE http://node1:9200/snort*
```

1. Launch the REPL.

```
source /etc/default/metron
cd $METRON_HOME
bin/stellar -z $ZOOKEEPER
```

1. Change the configuration so that Elasticsearch generates its own unique 
document ID. Define `es.doc.id.source.field` to be an empty or blank in the 
global settings.

```
[Stellar]>>> g := CONFIG_GET("GLOBAL")
...
[Stellar]>>> g := SHELL_EDIT(g)
{
  "es.clustername" : "metron",
  "es.ip" : "node1:9300",
  "es.date.format" : ".MM.dd.HH",
  "es.document.id": " ",
  "parser.error.topic" : "indexing",
  "update.hbase.table" : "metron_update",
  "update.hbase.cf" : "t",
  "es.client.settings" : {
"client.transport.ping_timeout" : "500s"
  },
  "profiler.client.period.duration" : "15",
  "profiler.client.period.duration.units" : "MINUTES",
  "user.settings.hbase.table" : "user_settings",
  "user.settings.hbase.cf" : "cf",
  "bootstrap.servers" : "node1:6667",
  "source.type.field" : "source:type",
  "threat.triage.score.field" : "threat:triage:score",
  "enrichment.writer.batchSize" : "15",
  "enrichment.writer.batchTimeout" : "0",
  "profiler.writer.batchSize" : "15",
  "profiler.writer.batchTimeout" : "0",
  "geo.hdfs.file" : "/apps/metron/geo/defau