[jira] [Updated] (METRON-1506) When using maas_deploy.sh to list the deployed models, the ApplicationMaster(MaaS) receives 'null request'
[ https://issues.apache.org/jira/browse/METRON-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tantian updated METRON-1506: Description: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO callback.LaunchContainer: Local Directory Contents ... So I read the codes in ModelSubmission.java, and I found the client (maas_deploy.sh) only communicates to the zookeeper, and does not communicate to the ApplicationMaster. But in the code, when the client list the queried deployed models, it sends a null request to the ApplicationMaster, which I thought is not necessary. The related codes are listed here (the red lines are added by me): {color:#33}ModelRequest request = null;{color} CuratorFramework client = null; try { RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3); client = CuratorFrameworkFactory.newClient(ModelSubmissionOptions.ZK_QUORUM.get(cli), retryPolicy); client.start(); MaaSConfig config = ConfigUtil.INSTANCE.read(client, ModelSubmissionOptions.ZK_ROOT.get(cli, "/metron/maas/config"), new MaaSConfig(), MaaSConfig.class); String mode = ModelSubmissionOptions.MODE.get(cli); if ( mode.equalsIgnoreCase("ADD")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.ADD); setVersion(ModelSubmissionOptions.VERSION.get(cli)); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setMemory(Integer.parseInt(ModelSubmissionOptions.MEMORY.get(cli))); setPath(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); }}; } else if(mode.equalsIgnoreCase("REMOVE")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.REMOVE); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setVersion(ModelSubmissionOptions.VERSION.get(cli)); }}; } else if(mode.equalsIgnoreCase("LIST")) { String name = ModelSubmissionOptions.NAME.get(cli, null); String version = ModelSubmissionOptions.VERSION.get(cli, null); ServiceDiscoverer serviceDiscoverer = new ServiceDiscoverer(client, config.getServiceRoot()); Model model = new Model(name, version); Mapendpoints = serviceDiscoverer.listEndpoints(model); for(Map.Entry kv : endpoints.entrySet()) { String modelTitle = "Model " + kv.getKey().getName() + " @ " + kv.getKey().getVersion(); System.out.println(modelTitle); for(ModelEndpoint endpoint : kv.getValue()) { System.out.println(endpoint); } } } if (ModelSubmissionOptions.LOCAL_MODEL_PATH.has(cli)) { File localDir = new File(ModelSubmissionOptions.LOCAL_MODEL_PATH.get(cli)); Path hdfsPath = new Path(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); updateHDFS(fs, localDir, hdfsPath); } {color:#ff}if (request != null) {{color} Queue queue = config.createQueue(ImmutableMap.of(ZKQueue.ZK_CLIENT, client)); queue.enqueue(request); {color:#ff} } {color} } finally { if (client != null) { client.close(); } } was: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO
[jira] [Updated] (METRON-1506) When using maas_deploy.sh to list the deployed models, the ApplicationMaster(MaaS) receives 'null request'
[ https://issues.apache.org/jira/browse/METRON-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tantian updated METRON-1506: Description: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO callback.LaunchContainer: Local Directory Contents ... So I read the codes in ModelSubmission.java, and I found the client (maas_deploy.sh) only communicates to the zookeeper, and does not communicate to the ApplicationMaster. But in the code, when the client list the queried deployed models, it sends a null request to the ApplicationMaster, which I thought is not necessary. The related codes are listed here (the red lines are added by me): {color:#33}ModelRequest request = null;{color} CuratorFramework client = null; try { RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3); client = CuratorFrameworkFactory.newClient(ModelSubmissionOptions.ZK_QUORUM.get(cli), retryPolicy); client.start(); MaaSConfig config = ConfigUtil.INSTANCE.read(client, ModelSubmissionOptions.ZK_ROOT.get(cli, "/metron/maas/config"), new MaaSConfig(), MaaSConfig.class); String mode = ModelSubmissionOptions.MODE.get(cli); if ( mode.equalsIgnoreCase("ADD")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.ADD); setVersion(ModelSubmissionOptions.VERSION.get(cli)); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setMemory(Integer.parseInt(ModelSubmissionOptions.MEMORY.get(cli))); setPath(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); }}; } else if(mode.equalsIgnoreCase("REMOVE")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.REMOVE); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setVersion(ModelSubmissionOptions.VERSION.get(cli)); }}; } else if(mode.equalsIgnoreCase("LIST")) { String name = ModelSubmissionOptions.NAME.get(cli, null); String version = ModelSubmissionOptions.VERSION.get(cli, null); ServiceDiscoverer serviceDiscoverer = new ServiceDiscoverer(client, config.getServiceRoot()); Model model = new Model(name, version); Mapendpoints = serviceDiscoverer.listEndpoints(model); for(Map.Entry kv : endpoints.entrySet()) { String modelTitle = "Model " + kv.getKey().getName() + " @ " + kv.getKey().getVersion(); System.out.println(modelTitle); for(ModelEndpoint endpoint : kv.getValue()) { System.out.println(endpoint); } } } if (ModelSubmissionOptions.LOCAL_MODEL_PATH.has(cli)) { File localDir = new File(ModelSubmissionOptions.LOCAL_MODEL_PATH.get(cli)); Path hdfsPath = new Path(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); updateHDFS(fs, localDir, hdfsPath); } {color:#ff}if (request != null) {{color} Queue queue = config.createQueue(ImmutableMap.of(ZKQueue.ZK_CLIENT, client)); queue.enqueue(request); {color:#d04437}}{color} } finally { if (client != null) { client.close(); } } was: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO
[jira] [Updated] (METRON-1506) When using maas_deploy.sh to list the deployed models, the ApplicationMaster(MaaS) receives 'null request'
[ https://issues.apache.org/jira/browse/METRON-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tantian updated METRON-1506: Description: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO callback.LaunchContainer: Local Directory Contents ... So I read the codes in ModelSubmission.java, and I found the client (maas_deploy.sh) only communicates to the zookeeper, and does not communicate to the ApplicationMaster. But in the code, when the client list the queried deployed models, it sends a null request to the ApplicationMaster, which I thought is not necessary. The related codes are listed here (the red lines are added by me): {color:#33}ModelRequest request = null;{color} CuratorFramework client = null; try { RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3); client = CuratorFrameworkFactory.newClient(ModelSubmissionOptions.ZK_QUORUM.get(cli), retryPolicy); client.start(); MaaSConfig config = ConfigUtil.INSTANCE.read(client, ModelSubmissionOptions.ZK_ROOT.get(cli, "/metron/maas/config"), new MaaSConfig(), MaaSConfig.class); String mode = ModelSubmissionOptions.MODE.get(cli); if ( mode.equalsIgnoreCase("ADD")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.ADD); setVersion(ModelSubmissionOptions.VERSION.get(cli)); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setMemory(Integer.parseInt(ModelSubmissionOptions.MEMORY.get(cli))); setPath(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); }}; } else if(mode.equalsIgnoreCase("REMOVE")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.REMOVE); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setVersion(ModelSubmissionOptions.VERSION.get(cli)); }}; } else if(mode.equalsIgnoreCase("LIST")) { String name = ModelSubmissionOptions.NAME.get(cli, null); String version = ModelSubmissionOptions.VERSION.get(cli, null); ServiceDiscoverer serviceDiscoverer = new ServiceDiscoverer(client, config.getServiceRoot()); Model model = new Model(name, version); Mapendpoints = serviceDiscoverer.listEndpoints(model); for(Map.Entry kv : endpoints.entrySet()) { String modelTitle = "Model " + kv.getKey().getName() + " @ " + kv.getKey().getVersion(); System.out.println(modelTitle); for(ModelEndpoint endpoint : kv.getValue()) { System.out.println(endpoint); } } } if (ModelSubmissionOptions.LOCAL_MODEL_PATH.has(cli)) { File localDir = new File(ModelSubmissionOptions.LOCAL_MODEL_PATH.get(cli)); Path hdfsPath = new Path(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); updateHDFS(fs, localDir, hdfsPath); } {color:#ff}if (request != null) {{color} Queue queue = config.createQueue(ImmutableMap.of(ZKQueue.ZK_CLIENT, client)); queue.enqueue(request); } } finally { if (client != null) { client.close(); } } was: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO
[jira] [Updated] (METRON-1506) When using maas_deploy.sh to list the deployed models, the ApplicationMaster(MaaS) receives 'null request'
[ https://issues.apache.org/jira/browse/METRON-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tantian updated METRON-1506: Description: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO callback.LaunchContainer: Local Directory Contents ... So I read the codes in ModelSubmission.java, and I found the client (maas_deploy.sh) only communicates to the zookeeper, and does not communicate to the ApplicationMaster. But in the code, when the client list the queried deployed models, it sends a null request to the ApplicationMaster, which I thought is not necessary. The related codes are listed here (the red lines are added by me): {color:#33}ModelRequest request = null;{color} CuratorFramework client = null; try { RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3); client = CuratorFrameworkFactory.newClient(ModelSubmissionOptions.ZK_QUORUM.get(cli), retryPolicy); client.start(); MaaSConfig config = ConfigUtil.INSTANCE.read(client, ModelSubmissionOptions.ZK_ROOT.get(cli, "/metron/maas/config"), new MaaSConfig(), MaaSConfig.class); String mode = ModelSubmissionOptions.MODE.get(cli); if ( mode.equalsIgnoreCase("ADD")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.ADD); setVersion(ModelSubmissionOptions.VERSION.get(cli)); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setMemory(Integer.parseInt(ModelSubmissionOptions.MEMORY.get(cli))); setPath(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); }}; } else if(mode.equalsIgnoreCase("REMOVE")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.REMOVE); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setVersion(ModelSubmissionOptions.VERSION.get(cli)); }}; } else if(mode.equalsIgnoreCase("LIST")) { String name = ModelSubmissionOptions.NAME.get(cli, null); String version = ModelSubmissionOptions.VERSION.get(cli, null); ServiceDiscoverer serviceDiscoverer = new ServiceDiscoverer(client, config.getServiceRoot()); Model model = new Model(name, version); Mapendpoints = serviceDiscoverer.listEndpoints(model); for(Map.Entry kv : endpoints.entrySet()) { String modelTitle = "Model " + kv.getKey().getName() + " @ " + kv.getKey().getVersion(); System.out.println(modelTitle); for(ModelEndpoint endpoint : kv.getValue()) { System.out.println(endpoint); } } } if (ModelSubmissionOptions.LOCAL_MODEL_PATH.has(cli)) { File localDir = new File(ModelSubmissionOptions.LOCAL_MODEL_PATH.get(cli)); Path hdfsPath = new Path(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); updateHDFS(fs, localDir, hdfsPath); } {color:#ff}if (request != null) {{color} Queue queue = config.createQueue(ImmutableMap.of(ZKQueue.ZK_CLIENT, client)); queue.enqueue(request); {color:#d04437}}{color} } finally { if (client != null) { client.close(); } } was: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO
[jira] [Updated] (METRON-1506) When using maas_deploy.sh to list the deployed models, the ApplicationMaster(MaaS) receives 'null request'
[ https://issues.apache.org/jira/browse/METRON-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tantian updated METRON-1506: Description: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#ff}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO callback.LaunchContainer: Local Directory Contents ... So I read the codes in ModelSubmission.java, and I found the client (maas_deploy.sh) only communicates to the zookeeper, and does not communicate to the ApplicationMaster. But in the code, when the client list the queried deployed models, it sends a null request to the ApplicationMaster, which I thought is not necessary. The related codes are listed here (the red lines are added by me): {color:#33}ModelRequest request = null;{color} CuratorFramework client = null; try { RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3); client = CuratorFrameworkFactory.newClient(ModelSubmissionOptions.ZK_QUORUM.get(cli), retryPolicy); client.start(); MaaSConfig config = ConfigUtil.INSTANCE.read(client, ModelSubmissionOptions.ZK_ROOT.get(cli, "/metron/maas/config"), new MaaSConfig(), MaaSConfig.class); String mode = ModelSubmissionOptions.MODE.get(cli); if ( mode.equalsIgnoreCase("ADD")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.ADD); setVersion(ModelSubmissionOptions.VERSION.get(cli)); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setMemory(Integer.parseInt(ModelSubmissionOptions.MEMORY.get(cli))); setPath(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); }}; } else if(mode.equalsIgnoreCase("REMOVE")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.REMOVE); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setVersion(ModelSubmissionOptions.VERSION.get(cli)); }}; } else if(mode.equalsIgnoreCase("LIST")) { String name = ModelSubmissionOptions.NAME.get(cli, null); String version = ModelSubmissionOptions.VERSION.get(cli, null); ServiceDiscoverer serviceDiscoverer = new ServiceDiscoverer(client, config.getServiceRoot()); Model model = new Model(name, version); Mapendpoints = serviceDiscoverer.listEndpoints(model); for(Map.Entry kv : endpoints.entrySet()) { String modelTitle = "Model " + kv.getKey().getName() + " @ " + kv.getKey().getVersion(); System.out.println(modelTitle); for(ModelEndpoint endpoint : kv.getValue()) { System.out.println(endpoint); } } } if (ModelSubmissionOptions.LOCAL_MODEL_PATH.has(cli)) { File localDir = new File(ModelSubmissionOptions.LOCAL_MODEL_PATH.get(cli)); Path hdfsPath = new Path(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); updateHDFS(fs, localDir, hdfsPath); } {color:#ff}if (request != null) {{color} Queue queue = config.createQueue(ImmutableMap.of(ZKQueue.ZK_CLIENT, client)); queue.enqueue(request); {color:#FF}} } finally { if (client != null) { client.close(); } } was: ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#FF}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO
[jira] [Created] (METRON-1506) When using maas_deploy.sh to list the deployed models, the ApplicationMaster(MaaS) receives 'null request'
tantian created METRON-1506: --- Summary: When using maas_deploy.sh to list the deployed models, the ApplicationMaster(MaaS) receives 'null request' Key: METRON-1506 URL: https://issues.apache.org/jira/browse/METRON-1506 Project: Metron Issue Type: Bug Affects Versions: 0.4.1 Reporter: tantian ... 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 18/03/26 17:16:23 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e05_1521078534073_0005_01_08 18/03/26 17:16:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1:45454 {color:#FF}18/03/26 17:16:32 ERROR service.ApplicationMaster: Received a null request...{color} 18/03/26 17:17:19 INFO service.ApplicationMaster: [ADD]: Received request for model :1.0x1 containers of size 512M at path /user/root/maas/sample 18/03/26 17:17:19 INFO service.ApplicationMaster: Found container id of 5497558138889 18/03/26 17:17:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e05_1521078534073_0005_01_09 18/03/26 17:17:19 INFO callback.LaunchContainer: Local Directory Contents ... So I read the codes in ModelSubmission.java, and I found the client (maas_deploy.sh) only communicates to the zookeeper, and does not communicate to the ApplicationMaster. But in the code, when the client list the queried deployed models, it sends a null request to the ApplicationMaster, which I thought is not necessary. The related codes are listed here (the red lines are added by me: {color:#33}ModelRequest request = null;{color} CuratorFramework client = null; try { RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3); client = CuratorFrameworkFactory.newClient(ModelSubmissionOptions.ZK_QUORUM.get(cli), retryPolicy); client.start(); MaaSConfig config = ConfigUtil.INSTANCE.read(client, ModelSubmissionOptions.ZK_ROOT.get(cli, "/metron/maas/config"), new MaaSConfig(), MaaSConfig.class); String mode = ModelSubmissionOptions.MODE.get(cli); if ( mode.equalsIgnoreCase("ADD")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.ADD); setVersion(ModelSubmissionOptions.VERSION.get(cli)); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setMemory(Integer.parseInt(ModelSubmissionOptions.MEMORY.get(cli))); setPath(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); }}; } else if(mode.equalsIgnoreCase("REMOVE")) { request = new ModelRequest() {{ setName(ModelSubmissionOptions.NAME.get(cli)); setAction(Action.REMOVE); setNumInstances(Integer.parseInt(ModelSubmissionOptions.NUM_INSTANCES.get(cli))); setVersion(ModelSubmissionOptions.VERSION.get(cli)); }}; } else if(mode.equalsIgnoreCase("LIST")) { String name = ModelSubmissionOptions.NAME.get(cli, null); String version = ModelSubmissionOptions.VERSION.get(cli, null); ServiceDiscoverer serviceDiscoverer = new ServiceDiscoverer(client, config.getServiceRoot()); Model model = new Model(name, version); Mapendpoints = serviceDiscoverer.listEndpoints(model); for(Map.Entry kv : endpoints.entrySet()) { String modelTitle = "Model " + kv.getKey().getName() + " @ " + kv.getKey().getVersion(); System.out.println(modelTitle); for(ModelEndpoint endpoint : kv.getValue()){ System.out.println(endpoint); } } } if (ModelSubmissionOptions.LOCAL_MODEL_PATH.has(cli)) { File localDir = new File(ModelSubmissionOptions.LOCAL_MODEL_PATH.get(cli)); Path hdfsPath = new Path(ModelSubmissionOptions.HDFS_MODEL_PATH.get(cli)); updateHDFS(fs, localDir, hdfsPath); } {color:#FF}if (request != null) {{color} Queue queue = config.createQueue(ImmutableMap.of(ZKQueue.ZK_CLIENT, client)); queue.enqueue(request); {color:#FF}}{color} } finally { if (client != null) { client.close(); } } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1494) Profiler Emits Messages to Kafka When Not Needed
[ https://issues.apache.org/jira/browse/METRON-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422904#comment-16422904 ] ASF GitHub Bot commented on METRON-1494: Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/967 This should go in after #977 > Profiler Emits Messages to Kafka When Not Needed > > > Key: METRON-1494 > URL: https://issues.apache.org/jira/browse/METRON-1494 > Project: Metron > Issue Type: Bug >Affects Versions: 0.4.2 >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > Fix For: Next + 1 > > > Using the 'result/triage' expression allows you to send profile data to > Kafka. This allows you to leverage the Threat Triage functionality against > data coming out of the Profiler. > If there is no 'result/triage' expression, then nothing should be sent to > Kafka. Currently, a message containing some data, but no actual profile > value, is sent to Kafka. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #967: METRON-1494 Profiler Emits Messages to Kafka When Not Nee...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/967 This should go in after #977 ---
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422848#comment-16422848 ] ASF GitHub Bot commented on METRON-1505: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600804 --- Diff: metron-analytics/metron-profiler/src/test/java/org/apache/metron/profiler/integration/ProfilerIntegrationTest.java --- @@ -70,247 +66,103 @@ private static final String TEST_RESOURCES = "../../metron-analytics/metron-profiler/src/test"; private static final String FLUX_PATH = "../metron-profiler/src/main/flux/profiler/remote.yaml"; - /** - * { - * "ip_src_addr": "10.0.0.1", - * "protocol": "HTTPS", - * "length": 10, - * "bytes_in": 234 - * } - */ - @Multiline - private static String message1; - - /** - * { - * "ip_src_addr": "10.0.0.2", - * "protocol": "HTTP", - * "length": 20, - * "bytes_in": 390 - * } - */ - @Multiline - private static String message2; - - /** - * { - * "ip_src_addr": "10.0.0.3", - * "protocol": "DNS", - * "length": 30, - * "bytes_in": 560 - * } - */ - @Multiline - private static String message3; - - private static ColumnBuilder columnBuilder; - private static ZKServerComponent zkComponent; - private static FluxTopologyComponent fluxComponent; - private static KafkaComponent kafkaComponent; - private static ConfigUploadComponent configUploadComponent; - private static ComponentRunner runner; - private static MockHTable profilerTable; + public static final long startAt = 10; + public static final String entity = "10.0.0.1"; private static final String tableName = "profiler"; private static final String columnFamily = "P"; - private static final double epsilon = 0.001; private static final String inputTopic = Constants.INDEXING_TOPIC; private static final String outputTopic = "profiles"; private static final int saltDivisor = 10; - private static final long windowLagMillis = TimeUnit.SECONDS.toMillis(5); + private static final long windowLagMillis = TimeUnit.SECONDS.toMillis(1); private static final long windowDurationMillis = TimeUnit.SECONDS.toMillis(5); - private static final long periodDurationMillis = TimeUnit.SECONDS.toMillis(15); - private static final long profileTimeToLiveMillis = TimeUnit.SECONDS.toMillis(20); + private static final long periodDurationMillis = TimeUnit.SECONDS.toMillis(10); + private static final long profileTimeToLiveMillis = TimeUnit.SECONDS.toMillis(15); private static final long maxRoutesPerBolt = 10; - /** - * Tests the first example contained within the README. - */ - @Test - public void testExample1() throws Exception { - -uploadConfig(TEST_RESOURCES + "/config/zookeeper/readme-example-1"); - -// start the topology and write test messages to kafka -fluxComponent.submitTopology(); -kafkaComponent.writeMessages(inputTopic, message1, message1, message1); -kafkaComponent.writeMessages(inputTopic, message2, message2, message2); -kafkaComponent.writeMessages(inputTopic, message3, message3, message3); - -// verify - ensure the profile is being persisted -waitOrTimeout(() -> profilerTable.getPutLog().size() > 0, -timeout(seconds(180))); - -// verify - only 10.0.0.2 sends 'HTTP', thus there should be only 1 value -List actuals = read(profilerTable.getPutLog(), columnFamily, -columnBuilder.getColumnQualifier("value"), Double.class); - -// verify - there are 3 'HTTP' each with 390 bytes -Assert.assertTrue(actuals.stream().anyMatch(val -> -MathUtils.equals(390.0 * 3, val, epsilon) -)); - } - - /** - * Tests the second example contained within the README. - */ - @Test - public void testExample2() throws Exception { - -uploadConfig(TEST_RESOURCES + "/config/zookeeper/readme-example-2"); - -// start the topology and write test messages to kafka -fluxComponent.submitTopology(); -kafkaComponent.writeMessages(inputTopic, message1, message1, message1); -kafkaComponent.writeMessages(inputTopic, message2, message2, message2); -kafkaComponent.writeMessages(inputTopic, message3, message3, message3); - -// expect 2 values written by the profile; one for 10.0.0.2 and another for 10.0.0.3 -final int expected = 2; - -// verify - ensure the profile is being persisted -waitOrTimeout(() ->
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600804 --- Diff: metron-analytics/metron-profiler/src/test/java/org/apache/metron/profiler/integration/ProfilerIntegrationTest.java --- @@ -70,247 +66,103 @@ private static final String TEST_RESOURCES = "../../metron-analytics/metron-profiler/src/test"; private static final String FLUX_PATH = "../metron-profiler/src/main/flux/profiler/remote.yaml"; - /** - * { - * "ip_src_addr": "10.0.0.1", - * "protocol": "HTTPS", - * "length": 10, - * "bytes_in": 234 - * } - */ - @Multiline - private static String message1; - - /** - * { - * "ip_src_addr": "10.0.0.2", - * "protocol": "HTTP", - * "length": 20, - * "bytes_in": 390 - * } - */ - @Multiline - private static String message2; - - /** - * { - * "ip_src_addr": "10.0.0.3", - * "protocol": "DNS", - * "length": 30, - * "bytes_in": 560 - * } - */ - @Multiline - private static String message3; - - private static ColumnBuilder columnBuilder; - private static ZKServerComponent zkComponent; - private static FluxTopologyComponent fluxComponent; - private static KafkaComponent kafkaComponent; - private static ConfigUploadComponent configUploadComponent; - private static ComponentRunner runner; - private static MockHTable profilerTable; + public static final long startAt = 10; + public static final String entity = "10.0.0.1"; private static final String tableName = "profiler"; private static final String columnFamily = "P"; - private static final double epsilon = 0.001; private static final String inputTopic = Constants.INDEXING_TOPIC; private static final String outputTopic = "profiles"; private static final int saltDivisor = 10; - private static final long windowLagMillis = TimeUnit.SECONDS.toMillis(5); + private static final long windowLagMillis = TimeUnit.SECONDS.toMillis(1); private static final long windowDurationMillis = TimeUnit.SECONDS.toMillis(5); - private static final long periodDurationMillis = TimeUnit.SECONDS.toMillis(15); - private static final long profileTimeToLiveMillis = TimeUnit.SECONDS.toMillis(20); + private static final long periodDurationMillis = TimeUnit.SECONDS.toMillis(10); + private static final long profileTimeToLiveMillis = TimeUnit.SECONDS.toMillis(15); private static final long maxRoutesPerBolt = 10; - /** - * Tests the first example contained within the README. - */ - @Test - public void testExample1() throws Exception { - -uploadConfig(TEST_RESOURCES + "/config/zookeeper/readme-example-1"); - -// start the topology and write test messages to kafka -fluxComponent.submitTopology(); -kafkaComponent.writeMessages(inputTopic, message1, message1, message1); -kafkaComponent.writeMessages(inputTopic, message2, message2, message2); -kafkaComponent.writeMessages(inputTopic, message3, message3, message3); - -// verify - ensure the profile is being persisted -waitOrTimeout(() -> profilerTable.getPutLog().size() > 0, -timeout(seconds(180))); - -// verify - only 10.0.0.2 sends 'HTTP', thus there should be only 1 value -List actuals = read(profilerTable.getPutLog(), columnFamily, -columnBuilder.getColumnQualifier("value"), Double.class); - -// verify - there are 3 'HTTP' each with 390 bytes -Assert.assertTrue(actuals.stream().anyMatch(val -> -MathUtils.equals(390.0 * 3, val, epsilon) -)); - } - - /** - * Tests the second example contained within the README. - */ - @Test - public void testExample2() throws Exception { - -uploadConfig(TEST_RESOURCES + "/config/zookeeper/readme-example-2"); - -// start the topology and write test messages to kafka -fluxComponent.submitTopology(); -kafkaComponent.writeMessages(inputTopic, message1, message1, message1); -kafkaComponent.writeMessages(inputTopic, message2, message2, message2); -kafkaComponent.writeMessages(inputTopic, message3, message3, message3); - -// expect 2 values written by the profile; one for 10.0.0.2 and another for 10.0.0.3 -final int expected = 2; - -// verify - ensure the profile is being persisted -waitOrTimeout(() -> profilerTable.getPutLog().size() >= expected, -timeout(seconds(90))); - -// verify - expect 2 results as 2 hosts involved; 10.0.0.2 sends 'HTTP' and 10.0.0.3 send 'DNS' -List actuals = read(profilerTable.getPutLog(),
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422845#comment-16422845 ] ASF GitHub Bot commented on METRON-1505: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600343 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -310,17 +313,37 @@ public void execute(TupleWindow window) { } /** - * Flush all expired profiles when a 'tick' is received. + * Flush all active profiles. + */ + protected void flushActive() { +activeFlushSignal.reset(); + +// flush the active profiles +List measurements; +synchronized(messageDistributor) { + measurements = messageDistributor.flush(); + emitMeasurements(measurements); +} + +LOG.debug("Flushed active profiles and found {} measurement(s).", measurements.size()); + + } + + /** + * Flushes all expired profiles. * - * If a profile has not received a message for an extended period of time then it is + * If a profile has not received a message for an extended period of time then it is * marked as expired. Periodically we need to flush these expired profiles to ensure * that their state is not lost. */ - private void handleTick() { + protected void flushExpired() { // flush the expired profiles -List measurements = messageDistributor.flushExpired(); -emitMeasurements(measurements); +List measurements; +synchronized (messageDistributor) { --- End diff -- Access to the `messageDistributor` has to be synchronized now. It is not thread safe and it could be called from either the timer thread or when tuples are received. > Intermittent Profiler Integration Test Failure > -- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035817 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035817 > 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122} > 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. > 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy >
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422846#comment-16422846 ] ASF GitHub Bot commented on METRON-1505: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600386 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -339,11 +362,13 @@ private void handleMessage(Tuple input) { Long timestamp = getField(TIMESTAMP_TUPLE_FIELD, input, Long.class); // keep track of time -flushSignal.update(timestamp); +activeFlushSignal.update(timestamp); // distribute the message MessageRoute route = new MessageRoute(definition, entity); -messageDistributor.distribute(message, timestamp, route, getStellarContext()); +synchronized (messageDistributor) { + messageDistributor.distribute(message, timestamp, route, getStellarContext()); +} --- End diff -- Access to the `messageDistributor` has to be synchronized now. It is not thread safe and it could be called from either the timer thread or when tuples are received. > Intermittent Profiler Integration Test Failure > -- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035817 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035817 > 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122} > 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. > 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122} > 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756051122} > 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756056122} > 2018-03-22
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422844#comment-16422844 ] ASF GitHub Bot commented on METRON-1505: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600285 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -310,17 +313,37 @@ public void execute(TupleWindow window) { } /** - * Flush all expired profiles when a 'tick' is received. + * Flush all active profiles. + */ + protected void flushActive() { +activeFlushSignal.reset(); + +// flush the active profiles +List measurements; +synchronized(messageDistributor) { --- End diff -- Access to the `messageDistributor` has to be synchronized now. It is not thread safe and it could be called from either the timer thread or when tuples are received. > Intermittent Profiler Integration Test Failure > -- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035817 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035817 > 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122} > 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. > 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122} > 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756051122} > 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756056122} > 2018-03-22 22:00:56 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:56 DEBUG WindowManager:144 - invoking > windowLifecycleListener onActivation, [1] events in window. > 2018-03-22 22:00:56 DEBUG
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600343 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -310,17 +313,37 @@ public void execute(TupleWindow window) { } /** - * Flush all expired profiles when a 'tick' is received. + * Flush all active profiles. + */ + protected void flushActive() { +activeFlushSignal.reset(); + +// flush the active profiles +List measurements; +synchronized(messageDistributor) { + measurements = messageDistributor.flush(); + emitMeasurements(measurements); +} + +LOG.debug("Flushed active profiles and found {} measurement(s).", measurements.size()); + + } + + /** + * Flushes all expired profiles. * - * If a profile has not received a message for an extended period of time then it is + * If a profile has not received a message for an extended period of time then it is * marked as expired. Periodically we need to flush these expired profiles to ensure * that their state is not lost. */ - private void handleTick() { + protected void flushExpired() { // flush the expired profiles -List measurements = messageDistributor.flushExpired(); -emitMeasurements(measurements); +List measurements; +synchronized (messageDistributor) { --- End diff -- Access to the `messageDistributor` has to be synchronized now. It is not thread safe and it could be called from either the timer thread or when tuples are received. ---
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600386 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -339,11 +362,13 @@ private void handleMessage(Tuple input) { Long timestamp = getField(TIMESTAMP_TUPLE_FIELD, input, Long.class); // keep track of time -flushSignal.update(timestamp); +activeFlushSignal.update(timestamp); // distribute the message MessageRoute route = new MessageRoute(definition, entity); -messageDistributor.distribute(message, timestamp, route, getStellarContext()); +synchronized (messageDistributor) { + messageDistributor.distribute(message, timestamp, route, getStellarContext()); +} --- End diff -- Access to the `messageDistributor` has to be synchronized now. It is not thread safe and it could be called from either the timer thread or when tuples are received. ---
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178600285 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -310,17 +313,37 @@ public void execute(TupleWindow window) { } /** - * Flush all expired profiles when a 'tick' is received. + * Flush all active profiles. + */ + protected void flushActive() { +activeFlushSignal.reset(); + +// flush the active profiles +List measurements; +synchronized(messageDistributor) { --- End diff -- Access to the `messageDistributor` has to be synchronized now. It is not thread safe and it could be called from either the timer thread or when tuples are received. ---
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422837#comment-16422837 ] ASF GitHub Bot commented on METRON-1505: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178599870 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -395,10 +420,46 @@ private void emitMeasurements(List measurements) { return value; } + /** + * Converts milliseconds to seconds and handles an ugly cast. + * + * @param millis Duration in milliseconds. + * @return Duration in seconds. + */ + private int toSeconds(long millis) { +return (int) TimeUnit.MILLISECONDS.toSeconds(millis); + } + + /** + * Creates a timer that regularly flushes expired profiles on a separate thread. + */ + private void startExpiredFlushTimer() { + +expiredFlushTimer = createTimer("flush-expired-profiles-timer"); +expiredFlushTimer.scheduleRecurring(0, toSeconds(profileTimeToLiveMillis), () -> flushExpired()); + } --- End diff -- This is the timer thread that flushes expired profiles regularly. > Intermittent Profiler Integration Test Failure > -- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035817 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035817 > 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122} > 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. > 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122} > 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756051122} > 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy >
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178599870 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/ProfileBuilderBolt.java --- @@ -395,10 +420,46 @@ private void emitMeasurements(List measurements) { return value; } + /** + * Converts milliseconds to seconds and handles an ugly cast. + * + * @param millis Duration in milliseconds. + * @return Duration in seconds. + */ + private int toSeconds(long millis) { +return (int) TimeUnit.MILLISECONDS.toSeconds(millis); + } + + /** + * Creates a timer that regularly flushes expired profiles on a separate thread. + */ + private void startExpiredFlushTimer() { + +expiredFlushTimer = createTimer("flush-expired-profiles-timer"); +expiredFlushTimer.scheduleRecurring(0, toSeconds(profileTimeToLiveMillis), () -> flushExpired()); + } --- End diff -- This is the timer thread that flushes expired profiles regularly. ---
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422834#comment-16422834 ] ASF GitHub Bot commented on METRON-1505: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178599674 --- Diff: metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/DefaultMessageDistributor.java --- @@ -281,29 +289,45 @@ public DefaultMessageDistributor withPeriodDuration(int duration, TimeUnit units /** * A listener that is notified when profiles expire from the active cache. */ - private class ActiveCacheRemovalListener implements RemovalListener{ + private class ActiveCacheRemovalListener implements RemovalListener { @Override -public void onRemoval(RemovalNotification notification) { +public void onRemoval(RemovalNotification notification) { - String key = notification.getKey(); ProfileBuilder expired = notification.getValue(); + LOG.warn("Profile expired from active cache; profile={}, entity={}", + expired.getDefinition().getProfile(), + expired.getEntity()); - LOG.warn("Profile expired from active cache; key={}", key); - expiredCache.put(key, expired); + // add the profile to the expired cache + expiredCache.put(notification.getKey(), expired); } } /** * A listener that is notified when profiles expire from the active cache. */ - private class ExpiredCacheRemovalListener implements RemovalListener { + private class ExpiredCacheRemovalListener implements RemovalListener { @Override -public void onRemoval(RemovalNotification notification) { +public void onRemoval(RemovalNotification notification) { + + if(notification.wasEvicted()) { + +// the expired profile was NOT flushed in time --- End diff -- A profile being removed from the expired cache is only 'bad' when it is evicted. When an eviction occurs, we get a WARN. Otherwise, only a DEBUG is used. This makes the logging much more useful when troubleshooting. > Intermittent Profiler Integration Test Failure > -- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035817 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178599674 --- Diff: metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/DefaultMessageDistributor.java --- @@ -281,29 +289,45 @@ public DefaultMessageDistributor withPeriodDuration(int duration, TimeUnit units /** * A listener that is notified when profiles expire from the active cache. */ - private class ActiveCacheRemovalListener implements RemovalListener{ + private class ActiveCacheRemovalListener implements RemovalListener { @Override -public void onRemoval(RemovalNotification notification) { +public void onRemoval(RemovalNotification notification) { - String key = notification.getKey(); ProfileBuilder expired = notification.getValue(); + LOG.warn("Profile expired from active cache; profile={}, entity={}", + expired.getDefinition().getProfile(), + expired.getEntity()); - LOG.warn("Profile expired from active cache; key={}", key); - expiredCache.put(key, expired); + // add the profile to the expired cache + expiredCache.put(notification.getKey(), expired); } } /** * A listener that is notified when profiles expire from the active cache. */ - private class ExpiredCacheRemovalListener implements RemovalListener { + private class ExpiredCacheRemovalListener implements RemovalListener { @Override -public void onRemoval(RemovalNotification notification) { +public void onRemoval(RemovalNotification notification) { + + if(notification.wasEvicted()) { + +// the expired profile was NOT flushed in time --- End diff -- A profile being removed from the expired cache is only 'bad' when it is evicted. When an eviction occurs, we get a WARN. Otherwise, only a DEBUG is used. This makes the logging much more useful when troubleshooting. ---
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422832#comment-16422832 ] ASF GitHub Bot commented on METRON-1505: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178599132 --- Diff: metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/DefaultMessageDistributor.java --- @@ -262,11 +262,19 @@ public ProfileBuilder getBuilder(MessageRoute route, Context context) throws Exe /** * Builds the key that is used to lookup the {@link ProfileBuilder} within the cache. * + * The cache key is built using the hash codes of the profile and entity name. If the profile + * definition is ever changed, the same cache entry will not be reused. This ensures that no + * state can be carried over from the old definition into the new, which might result in an + * invalid profile measurement. + * * @param profile The profile definition. * @param entity The entity. */ - private String cacheKey(ProfileConfig profile, String entity) { -return format("%s:%s", profile, entity); + private int cacheKey(ProfileConfig profile, String entity) { +return new HashCodeBuilder(17, 37) --- End diff -- The cache key needs to ensure that when the user changes a profile definition, even slightly, that a different `ProfileBuilder` is used. Reusing the same `ProfileBuilder` would create inconsistent results. Instead of using `ProfileConfig.toString()` as part of the cache key, it now uses the hash code from the profile and the entity. I think this is less error prone and more performant. > Intermittent Profiler Integration Test Failure > -- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035817 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035817 > 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122} > 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. > 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122} > 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/977#discussion_r178599132 --- Diff: metron-analytics/metron-profiler-common/src/main/java/org/apache/metron/profiler/DefaultMessageDistributor.java --- @@ -262,11 +262,19 @@ public ProfileBuilder getBuilder(MessageRoute route, Context context) throws Exe /** * Builds the key that is used to lookup the {@link ProfileBuilder} within the cache. * + * The cache key is built using the hash codes of the profile and entity name. If the profile + * definition is ever changed, the same cache entry will not be reused. This ensures that no + * state can be carried over from the old definition into the new, which might result in an + * invalid profile measurement. + * * @param profile The profile definition. * @param entity The entity. */ - private String cacheKey(ProfileConfig profile, String entity) { -return format("%s:%s", profile, entity); + private int cacheKey(ProfileConfig profile, String entity) { +return new HashCodeBuilder(17, 37) --- End diff -- The cache key needs to ensure that when the user changes a profile definition, even slightly, that a different `ProfileBuilder` is used. Reusing the same `ProfileBuilder` would create inconsistent results. Instead of using `ProfileConfig.toString()` as part of the cache key, it now uses the hash code from the profile and the entity. I think this is less error prone and more performant. ---
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422827#comment-16422827 ] ASF GitHub Bot commented on METRON-1505: GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/977 METRON-1505 Intermittent Profiler Integration Test Failure ### Problem The integration tests were failing intermittently when Storm unexpectedly expired messages generated by the integration tests. When Storm expired these messages they were never received by the Profiler bolts, which caused the integration tests to fail. ### Root Cause Storm's event window mechanism was not configured correctly to use the timestamp extracted from the telemetry message. Storm was instead defaulting to system time. If the time when the downstream `ProfileBuilderBolt` processed a message differed significantly enough from when the upstream `ProfileSplitterBolt` processed the message, the message would be errantly expired by Storm. This is why the problem could only be replicated when run in Travis, a resource constrained environment. When run on any other environment, the system time when these two events occur will not differ enough for Storm to mistakenly expire the test messages. This did not necessarily matter for the core functioning of the Profiler, as the Profiler itself continued to use the correct event timestamps. This bug only affected significantly out-of-order messages and the flushing of expired profiles for the integration tests. ### The Fix The simple fix was to ensure that Storm uses the correct event timestamp field. Doing this highlighted another problem. Storm does not work correctly when using tick tuples along with an event timestamp field. Storm will attempt to extract an event timestamp from the tick tuple, which will not exist and cause the entire topology to fail. This meant that I could not use tick tuples. To work around this, I created a separate thread that flushes the expired profiles regularly. The separate thread introduces thread safety concerns, so I also needed to perform some locking. ### Changes Most of these changes were done in separate commits to making review easier. 1. Added a separate thread to the `ProfileBuilderBolt` to flush expired profiles regularly. This is the core fix to the integration test bug. 2. Corrected the key generated to cache `ProfileBuilder` objects. This previously relied on the underlying `ProfileConfig.toString` method which was error prone and slow. It now uses the hash key. 3. Reduced the number of Profiler integration tests. There is now one integration test that tests event time processing and another that tests the same profile using processing time. Previously there were a number of different profiles that were tested. This was necessary before as the integration tests were the only effective way to test different profile logic. Since then, significant refactoring has occurred which allowed the same logic to be tested in unit tests rather than in integration tests. This allowed me to clean-up these tests which reduces run time and complexity in the integration tests. 4. Added some simple debug logging to `HBaseBolt`. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1505 Alternatively you can review and apply these changes as the patch at:
[GitHub] metron pull request #977: METRON-1505 Intermittent Profiler Integration Test...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/977 METRON-1505 Intermittent Profiler Integration Test Failure ### Problem The integration tests were failing intermittently when Storm unexpectedly expired messages generated by the integration tests. When Storm expired these messages they were never received by the Profiler bolts, which caused the integration tests to fail. ### Root Cause Storm's event window mechanism was not configured correctly to use the timestamp extracted from the telemetry message. Storm was instead defaulting to system time. If the time when the downstream `ProfileBuilderBolt` processed a message differed significantly enough from when the upstream `ProfileSplitterBolt` processed the message, the message would be errantly expired by Storm. This is why the problem could only be replicated when run in Travis, a resource constrained environment. When run on any other environment, the system time when these two events occur will not differ enough for Storm to mistakenly expire the test messages. This did not necessarily matter for the core functioning of the Profiler, as the Profiler itself continued to use the correct event timestamps. This bug only affected significantly out-of-order messages and the flushing of expired profiles for the integration tests. ### The Fix The simple fix was to ensure that Storm uses the correct event timestamp field. Doing this highlighted another problem. Storm does not work correctly when using tick tuples along with an event timestamp field. Storm will attempt to extract an event timestamp from the tick tuple, which will not exist and cause the entire topology to fail. This meant that I could not use tick tuples. To work around this, I created a separate thread that flushes the expired profiles regularly. The separate thread introduces thread safety concerns, so I also needed to perform some locking. ### Changes Most of these changes were done in separate commits to making review easier. 1. Added a separate thread to the `ProfileBuilderBolt` to flush expired profiles regularly. This is the core fix to the integration test bug. 2. Corrected the key generated to cache `ProfileBuilder` objects. This previously relied on the underlying `ProfileConfig.toString` method which was error prone and slow. It now uses the hash key. 3. Reduced the number of Profiler integration tests. There is now one integration test that tests event time processing and another that tests the same profile using processing time. Previously there were a number of different profiles that were tested. This was necessary before as the integration tests were the only effective way to test different profile logic. Since then, significant refactoring has occurred which allowed the same logic to be tested in unit tests rather than in integration tests. This allowed me to clean-up these tests which reduces run time and complexity in the integration tests. 4. Added some simple debug logging to `HBaseBolt`. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1505 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/977.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #977 commit
[jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422755#comment-16422755 ] Nick Allen commented on METRON-1505: The problem is somewhere in Storm's windowing functionality. The time that it initially recognizes is too far in the future and causes it to mark the messages sent in as expired. This only occurs intermittently. Here you can see test messages generated with the last timestamp being 1521756035817. {code} 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with timestamp=1521756035759 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with timestamp=1521756035802 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with timestamp=1521756035806 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.2, timestamp=1521756035807 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035807 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.2, timestamp=1521756035808 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035808 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.2, timestamp=1521756035813 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035813 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.3, timestamp=1521756035814 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035814 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.3, timestamp=1521756035816 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035816 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.3, timestamp=1521756035817 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035817 {code} The first timestamp that Storm recognizes is 1521756041122, which is 5.3 seconds ahead of the latest timestamp in the data. {code} 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy\{windowLength=5000, referenceTime=1521756041122} {code} Storm then marks these messages as expired and the Profiler never sees them. {code} 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, skipping onActivation {code} Epic test failure. > Intermittent Profiler Integration Test Failure > -- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG
[jira] [Created] (METRON-1505) Intermittent Profiler Integration Test Failure
Nick Allen created METRON-1505: -- Summary: Intermittent Profiler Integration Test Failure Key: METRON-1505 URL: https://issues.apache.org/jira/browse/METRON-1505 Project: Metron Issue Type: Bug Reporter: Nick Allen Assignee: Nick Allen The Profiler integration tests which use processing time fail intermittently when run in Travis CI. {code:java} 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with timestamp=1521756035759 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with timestamp=1521756035802 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with timestamp=1521756035806 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.2, timestamp=1521756035807 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035807 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.2, timestamp=1521756035808 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035808 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.2, timestamp=1521756035813 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035813 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.3, timestamp=1521756035814 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035814 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.3, timestamp=1521756035816 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035816 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2, entity=10.0.0.3, timestamp=1521756035817 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with timestamp=1521756035817 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122} 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, skipping onActivation 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122} 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window. 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window, skipping onActivation 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000, referenceTime=1521756051122} 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window. 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window, skipping onActivation 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000, referenceTime=1521756056122} 2018-03-22 22:00:56 DEBUG WindowManager:212 - [0] events expired from window. 2018-03-22 22:00:56 DEBUG WindowManager:144 - invoking windowLifecycleListener onActivation, [1] events in window. 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 tuple(s), 0 expired, 1 new 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s). 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles and found 0 measurement(s). 2018-03-22 22:00:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms until flush; currentTime=0, flushTime=0 2018-03-22 22:01:01 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000, referenceTime=1521756061122} 2018-03-22 22:01:01 DEBUG WindowManager:212 - [1] events expired from window. 2018-03-22 22:01:01 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry 2018-03-22 22:01:01 DEBUG WindowManager:147 - No events in the window, skipping onActivation 2018-03-22 22:01:06 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000, referenceTime=1521756066122} 2018-03-22 22:01:06 DEBUG WindowManager:212 - [0] events expired from window. 2018-03-22 22:01:06 DEBUG WindowManager:147 - No events in the window, skipping onActivation 2018-03-22 22:01:11 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/946#discussion_r178586216 --- Diff: metron-interface/metron-rest/src/main/scripts/metron-rest.sh --- @@ -36,6 +36,7 @@ METRON_SYSCONFIG="${METRON_SYSCONFIG:-/etc/default/metron}" METRON_LOG_DIR="${METRON_LOG_DIR:-/var/log/metron}" METRON_PID_FILE="${METRON_PID_FILE:-/var/run/metron/metron-rest.pid}" PARSER_CONTRIB=${PARSER_CONTRIB:-$METRON_HOME/parser_contrib} +INDEXING_CONTRIB=${INDEXING_CONTRIB:-$METRON_HOME/indexing_contrib} --- End diff -- @nickwallen This is consistent with what we've done for the parsers (see line 38 immediately above). We could possibly refactor, but I wouldn't advise it as part of this PR. Not defining a default would mean the responsibility is now on the end user for indexing, but not for parsers. I think that is going to be more confusing to a user in the current state. I do agree that these could probably be moved to the defaults script as a follow-on refactoring PR. ---
[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/946#discussion_r178584790 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/utils/ElasticsearchUtils.java --- @@ -107,32 +113,94 @@ public static String getBaseIndexName(String indexName) { return parts[0]; } - public static TransportClient getClient(MapglobalConfiguration, Map optionalSettings) { + /** + * Instantiates an Elasticsearch client based on es.client.class, if set. Defaults to + * org.elasticsearch.transport.client.PreBuiltTransportClient. + * + * @param globalConfiguration Metron global config + * @return + */ + public static TransportClient getClient(Map globalConfiguration) { +Set customESSettings = new HashSet<>(); +customESSettings.addAll(Arrays.asList("es.client.class", "es.xpack.username", "es.xpack.password.file")); --- End diff -- These are non-es-specific settings that we're pulling via the global config key that are leveraged for constructing the client. The client blows up on any unexpected keys in its config. We should probably make a note. ---
[jira] [Commented] (METRON-1465) X-pack support for Elasticsearch
[ https://issues.apache.org/jira/browse/METRON-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422731#comment-16422731 ] ASF GitHub Bot commented on METRON-1465: Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/946#discussion_r178584790 --- Diff: metron-platform/metron-elasticsearch/src/main/java/org/apache/metron/elasticsearch/utils/ElasticsearchUtils.java --- @@ -107,32 +113,94 @@ public static String getBaseIndexName(String indexName) { return parts[0]; } - public static TransportClient getClient(MapglobalConfiguration, Map optionalSettings) { + /** + * Instantiates an Elasticsearch client based on es.client.class, if set. Defaults to + * org.elasticsearch.transport.client.PreBuiltTransportClient. + * + * @param globalConfiguration Metron global config + * @return + */ + public static TransportClient getClient(Map globalConfiguration) { +Set customESSettings = new HashSet<>(); +customESSettings.addAll(Arrays.asList("es.client.class", "es.xpack.username", "es.xpack.password.file")); --- End diff -- These are non-es-specific settings that we're pulling via the global config key that are leveraged for constructing the client. The client blows up on any unexpected keys in its config. We should probably make a note. > X-pack support for Elasticsearch > > > Key: METRON-1465 > URL: https://issues.apache.org/jira/browse/METRON-1465 > Project: Metron > Issue Type: Bug >Affects Versions: 0.4.2 >Reporter: Ward Bekker >Priority: Major > Fix For: 0.4.3 > > > Provide support for X-pack secured Elasticsearch clusters for the > Elasticsearch writer and the DAO used by the rest service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1465) X-pack support for Elasticsearch
[ https://issues.apache.org/jira/browse/METRON-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422724#comment-16422724 ] ASF GitHub Bot commented on METRON-1465: Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/946#discussion_r178584154 --- Diff: metron-deployment/Kerberos-manual-setup.md --- @@ -533,3 +534,211 @@ In order to correct this, you should: ### References * [https://github.com/apache/storm/blob/master/SECURITY.md](https://github.com/apache/storm/blob/master/SECURITY.md) + +X-Pack +-- + +First, stop the random_access_indexing topology through the Storm UI or from the CLI, e.g. + +``` +storm kill random_access_indexing +``` + +Here are instructions for enabling X-Pack with Elasticsearch and Kibana: https://www.elastic.co/guide/en/x-pack/5.6/installing-xpack.html + +You need to be sure to add the appropriate username and password for Elasticsearch and Kibana to enable external connections from Metron components. e.g. the following will create a user "transport_client_user" with password "changeme" and "superuser" credentials. + +``` +sudo /usr/share/elasticsearch/bin/x-pack/users useradd transport_client_user -p changeme -r superuser +``` + +Once you've picked a password to connect to ES, you need to upload a 1-line file to HDFS with that password in it. Metron will use this file to securely read the password in order to connect to ES securely. + +Here is an example using "changeme" as the password + +``` +echo changeme > /tmp/xpack-password +sudo -u hdfs hdfs dfs -mkdir /apps/metron/elasticsearch/ +sudo -u hdfs hdfs dfs -put /tmp/xpack-password /apps/metron/elasticsearch/ +sudo -u hdfs hdfs dfs -chown metron:metron /apps/metron/elasticsearch/xpack-password +``` + +New settings have been added to configure the Elasticsearch client. By default the client will run as the normal ES prebuilt transport client. If you enable X-Pack you should set the es.client.class as shown below. + +Add the es settings to global.json + +``` +/usr/metron/0.4.3/config/zookeeper/global.json -> + + "es.client.settings" : { + "es.client.class" : "org.elasticsearch.xpack.client.PreBuiltXPackTransportClient", + "es.xpack.username" : "transport_client_user", + "es.xpack.password.file" : "/apps/metron/elasticsearch/xpack-password" + } +``` + +Submit the update to Zookeeper + +``` +$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i METRON_HOME/config/zookeeper/ -z $ZOOKEEPER +``` + +The last step before restarting the topology is to create a custom X-Pack shaded and relocated jar. This is up to you because of licensing restrictions, but here is a sample Maven pom file that should help. + +``` + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 +org.elasticsearch +elasticsearch-xpack-shaded +elasticsearch-xpack-shaded +jar +5.6.2 + + +elasticsearch-releases +https://artifacts.elastic.co/maven + +true + + +false + + + + + +org.elasticsearch.client +x-pack-transport +5.6.2 + + --- End diff -- @nickwallen It is necessary, Otherwise the x-pack client will have conflicts. We can't package it due to licensing, and we shouldn't leave users completely on their own to figure out what needs to be excluded, shaded and relocated. > X-pack support for Elasticsearch > > > Key: METRON-1465 > URL: https://issues.apache.org/jira/browse/METRON-1465 > Project: Metron > Issue Type: Bug >Affects Versions: 0.4.2 >Reporter: Ward Bekker >Priority: Major > Fix For: 0.4.3 > > > Provide support for X-pack secured Elasticsearch clusters for the > Elasticsearch writer and the DAO used by the rest service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/946#discussion_r178584154 --- Diff: metron-deployment/Kerberos-manual-setup.md --- @@ -533,3 +534,211 @@ In order to correct this, you should: ### References * [https://github.com/apache/storm/blob/master/SECURITY.md](https://github.com/apache/storm/blob/master/SECURITY.md) + +X-Pack +-- + +First, stop the random_access_indexing topology through the Storm UI or from the CLI, e.g. + +``` +storm kill random_access_indexing +``` + +Here are instructions for enabling X-Pack with Elasticsearch and Kibana: https://www.elastic.co/guide/en/x-pack/5.6/installing-xpack.html + +You need to be sure to add the appropriate username and password for Elasticsearch and Kibana to enable external connections from Metron components. e.g. the following will create a user "transport_client_user" with password "changeme" and "superuser" credentials. + +``` +sudo /usr/share/elasticsearch/bin/x-pack/users useradd transport_client_user -p changeme -r superuser +``` + +Once you've picked a password to connect to ES, you need to upload a 1-line file to HDFS with that password in it. Metron will use this file to securely read the password in order to connect to ES securely. + +Here is an example using "changeme" as the password + +``` +echo changeme > /tmp/xpack-password +sudo -u hdfs hdfs dfs -mkdir /apps/metron/elasticsearch/ +sudo -u hdfs hdfs dfs -put /tmp/xpack-password /apps/metron/elasticsearch/ +sudo -u hdfs hdfs dfs -chown metron:metron /apps/metron/elasticsearch/xpack-password +``` + +New settings have been added to configure the Elasticsearch client. By default the client will run as the normal ES prebuilt transport client. If you enable X-Pack you should set the es.client.class as shown below. + +Add the es settings to global.json + +``` +/usr/metron/0.4.3/config/zookeeper/global.json -> + + "es.client.settings" : { + "es.client.class" : "org.elasticsearch.xpack.client.PreBuiltXPackTransportClient", + "es.xpack.username" : "transport_client_user", + "es.xpack.password.file" : "/apps/metron/elasticsearch/xpack-password" + } +``` + +Submit the update to Zookeeper + +``` +$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i METRON_HOME/config/zookeeper/ -z $ZOOKEEPER +``` + +The last step before restarting the topology is to create a custom X-Pack shaded and relocated jar. This is up to you because of licensing restrictions, but here is a sample Maven pom file that should help. + +``` + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 +org.elasticsearch +elasticsearch-xpack-shaded +elasticsearch-xpack-shaded +jar +5.6.2 + + +elasticsearch-releases +https://artifacts.elastic.co/maven + +true + + +false + + + + + +org.elasticsearch.client +x-pack-transport +5.6.2 + + --- End diff -- @nickwallen It is necessary, Otherwise the x-pack client will have conflicts. We can't package it due to licensing, and we shouldn't leave users completely on their own to figure out what needs to be excluded, shaded and relocated. ---
[jira] [Commented] (METRON-1421) Create a SolrMetaAlertDao
[ https://issues.apache.org/jira/browse/METRON-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422541#comment-16422541 ] ASF GitHub Bot commented on METRON-1421: Github user justinleet commented on the issue: https://github.com/apache/metron/pull/970 @nickwallen My main concern re: pluggability is for a lot of this stuff, it's going to be very pretty specific to the fact that (right now) we're using Lucene stores (and it might be worth renaming some of the base classes to things like `AbstractLuceneMetaAlertDao` to make that more clear). The entire denormalization implementation is directly because of the limitations / strengths of Lucene stores. A store that implements more familiar joins would be totally different (and probably much, much simpler) than the ES and Solr impls. I agree that refactoring is extremely nice to improve testability, but I'm not sure what the benefit of making things more extensively pluggable (beyond splitting the DAO up into the Search / Update / subsets + your example of calculating metascores is good). Are you more interested into breaking things out into more classes than that, just breaking apart functions more so they're more easily tested, some combination of both? I guess the main question is: "Is breaking the meta alert dao into the various sub functions (Search, Update, etc.) + pulling out the calculate logic + at least a refactoring pass for testability a good first step in moving this forward?" I want to make sure there's at least a clear next step before doing a lot of adjusting, even if the exact final state might shift a bit. > Create a SolrMetaAlertDao > - > > Key: METRON-1421 > URL: https://issues.apache.org/jira/browse/METRON-1421 > Project: Metron > Issue Type: Sub-task >Reporter: Justin Leet >Assignee: Justin Leet >Priority: Major > > Create an implementation of the MetaAlertDao for Solr. This will involve > implementing the various MetaAlertDao methods using the SolrJ library and > also providing a SolrMetaAlertIntegrationTest (similar to > ElasticsearchMetaAlertIntegrationTest). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #970: METRON-1421: Create a SolrMetaAlertDao
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/970 @nickwallen My main concern re: pluggability is for a lot of this stuff, it's going to be very pretty specific to the fact that (right now) we're using Lucene stores (and it might be worth renaming some of the base classes to things like `AbstractLuceneMetaAlertDao` to make that more clear). The entire denormalization implementation is directly because of the limitations / strengths of Lucene stores. A store that implements more familiar joins would be totally different (and probably much, much simpler) than the ES and Solr impls. I agree that refactoring is extremely nice to improve testability, but I'm not sure what the benefit of making things more extensively pluggable (beyond splitting the DAO up into the Search / Update / subsets + your example of calculating metascores is good). Are you more interested into breaking things out into more classes than that, just breaking apart functions more so they're more easily tested, some combination of both? I guess the main question is: "Is breaking the meta alert dao into the various sub functions (Search, Update, etc.) + pulling out the calculate logic + at least a refactoring pass for testability a good first step in moving this forward?" I want to make sure there's at least a clear next step before doing a lot of adjusting, even if the exact final state might shift a bit. ---
[jira] [Commented] (METRON-1421) Create a SolrMetaAlertDao
[ https://issues.apache.org/jira/browse/METRON-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422274#comment-16422274 ] ASF GitHub Bot commented on METRON-1421: Github user justinleet commented on the issue: https://github.com/apache/metron/pull/970 @merrimanr I'll take a look and see where things are falling apart and make sure to get whatever test case sorted out. Sounds like the branch is in a state where we just spin up full dev, get Solr setup, and change the configs? Anything else needed? > Create a SolrMetaAlertDao > - > > Key: METRON-1421 > URL: https://issues.apache.org/jira/browse/METRON-1421 > Project: Metron > Issue Type: Sub-task >Reporter: Justin Leet >Assignee: Justin Leet >Priority: Major > > Create an implementation of the MetaAlertDao for Solr. This will involve > implementing the various MetaAlertDao methods using the SolrJ library and > also providing a SolrMetaAlertIntegrationTest (similar to > ElasticsearchMetaAlertIntegrationTest). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #970: METRON-1421: Create a SolrMetaAlertDao
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/970 @merrimanr I'll take a look and see where things are falling apart and make sure to get whatever test case sorted out. Sounds like the branch is in a state where we just spin up full dev, get Solr setup, and change the configs? Anything else needed? ---