[Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
[ https://bro-tracker.atlassian.net/browse/BIT-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=17105#comment-17105 ] Robin Sommer commented on BIT-1215: --- I haven't looked at the code yet but if there's hard line length limit in there, that's a problem. bro-cut shouldn't care how long lines are. bro-cut should be rewritten in C for speed and to not depend on gawk Key: BIT-1215 URL: https://bro-tracker.atlassian.net/browse/BIT-1215 Project: Bro Issue Tracker Issue Type: Improvement Components: Bro, bro-aux Reporter: Daniel Thayer Fix For: 2.4 The current implementation of bro-cut is too slow when processing large log files (takes more than a minute to process a single log file a few hundred MB in size). Justin Azoff rewrote bro-cut in C and found that it runs an order of magnitude faster. Another benefit of a C version of bro-cut is that we will no longer depend on gawk for anything (and some of Bro's supported platforms do not include gawk by default). -- This message was sent by Atlassian JIRA (v6.3-OD-08-005-WN#6328) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
I haven't looked at the code yet but if there's hard line length limit in there, that's a problem. bro-cut shouldn't care how long lines are. ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
We are going to make it configurable and default to like a 1000KB line. Otherwise, you add a check to see if you need to reallocate memory for every line processed, which seems inefficient for edge cases. Letting the user override the default is a good compromise though. On Jul 10, 2014, at 4:30 PM, Robin Sommer (JIRA) j...@bro-tracker.atlassian.net wrote: I haven't looked at the code yet but if there's hard line length limit in there, that's a problem. bro-cut shouldn't care how long lines are. ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
[ https://bro-tracker.atlassian.net/browse/BIT-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=17107#comment-17107 ] Justin Azoff commented on BIT-1215: --- I think start with 1M and realloc 2x as needed is the way to go after all. We need (and already have) the check to see if fgets truncated the line. I think the only thing to do would be to add an absolute max line length of 64M or something to handle the case where someone accidentally runs bro-cut against a binary file (like a compressed bro log) that just doesn't contain any newlines. bro-cut should be rewritten in C for speed and to not depend on gawk Key: BIT-1215 URL: https://bro-tracker.atlassian.net/browse/BIT-1215 Project: Bro Issue Tracker Issue Type: Improvement Components: Bro, bro-aux Reporter: Daniel Thayer Fix For: 2.4 The current implementation of bro-cut is too slow when processing large log files (takes more than a minute to process a single log file a few hundred MB in size). Justin Azoff rewrote bro-cut in C and found that it runs an order of magnitude faster. Another benefit of a C version of bro-cut is that we will no longer depend on gawk for anything (and some of Bro's supported platforms do not include gawk by default). -- This message was sent by Atlassian JIRA (v6.3-OD-08-005-WN#6328) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
[ https://bro-tracker.atlassian.net/browse/BIT-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=17106#comment-17106 ] Adam Slagell commented on BIT-1215: --- We are going to make it configurable and default to like a 1000KB line. Otherwise, you add a check to see if you need to reallocate memory for every line processed, which seems inefficient for edge cases. Letting the user override the default is a good compromise though. bro-cut should be rewritten in C for speed and to not depend on gawk Key: BIT-1215 URL: https://bro-tracker.atlassian.net/browse/BIT-1215 Project: Bro Issue Tracker Issue Type: Improvement Components: Bro, bro-aux Reporter: Daniel Thayer Fix For: 2.4 The current implementation of bro-cut is too slow when processing large log files (takes more than a minute to process a single log file a few hundred MB in size). Justin Azoff rewrote bro-cut in C and found that it runs an order of magnitude faster. Another benefit of a C version of bro-cut is that we will no longer depend on gawk for anything (and some of Bro's supported platforms do not include gawk by default). -- This message was sent by Atlassian JIRA (v6.3-OD-08-005-WN#6328) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
[ https://bro-tracker.atlassian.net/browse/BIT-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=17108#comment-17108 ] Robin Sommer commented on BIT-1215: --- Yes. Maybe a bit less than 2x, exponential grows quickly. :) Would be nicer to recognize that differently, like by not finding a log header; that way we can give a good error message. If such a check is in place, I wouldn't actually bother with another double-check on line length; in the unlikely case that the file has a correct header but totally broken content, I'm sure there are plenty other cases where bro-cut would fail, and it seems there's not more here that can happen in addition than running out of memory (which the OS will catch). bro-cut should be rewritten in C for speed and to not depend on gawk Key: BIT-1215 URL: https://bro-tracker.atlassian.net/browse/BIT-1215 Project: Bro Issue Tracker Issue Type: Improvement Components: Bro, bro-aux Reporter: Daniel Thayer Fix For: 2.4 The current implementation of bro-cut is too slow when processing large log files (takes more than a minute to process a single log file a few hundred MB in size). Justin Azoff rewrote bro-cut in C and found that it runs an order of magnitude faster. Another benefit of a C version of bro-cut is that we will no longer depend on gawk for anything (and some of Bro's supported platforms do not include gawk by default). -- This message was sent by Atlassian JIRA (v6.3-OD-08-005-WN#6328) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1217) Documentation: include type for vectors
Johanna Amann created BIT-1217: -- Summary: Documentation: include type for vectors Key: BIT-1217 URL: https://bro-tracker.atlassian.net/browse/BIT-1217 Project: Bro Issue Tracker Issue Type: Problem Components: Bro, Website Affects Versions: git/master Reporter: Johanna Amann Fix For: 2.4 While browsing our documentation, I noticed that at the moment the script reference does not contain the type that is stored inside of a vector. This would be highly convenient sometimes. At the moment, it is e.g. impossible to find out what kind of Data a vector in an Info record contains. See http://www.bro.org/sphinx-git/scripts/base/protocols/ssl/main.bro.html#type-SSL::Info for an example/ -- This message was sent by Atlassian JIRA (v6.3-OD-08-005-WN#6328) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] [JIRA] (BIT-1215) bro-cut should be rewritten in C for speed and to not depend on gawk
On Thu, Jul 10, 2014 at 17:27 -0500, you wrote: I think start with 1M and realloc 2x as needed is the way to go after all. Yes. Maybe a bit less than 2x, exponential grows quickly. :) I think the only thing to do would be to add an absolute max line length of 64M or something to handle the case where someone accidentally runs bro-cut against a binary file (like a compressed bro log) that just doesn't contain any newlines. Would be nicer to recognize that differently, like by not finding a log header; that way we can give a good error message. If such a check is in place, I wouldn't actually bother with another double-check on line length; in the unlikely case that the file has a correct header but totally broken content, I'm sure there are plenty other cases where bro-cut would fail, and it seems there's not more here that can happen in addition than running out of memory (which the OS will catch). ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] [JIRA] (BIT-1217) Documentation: include type for vectors
[ https://bro-tracker.atlassian.net/browse/BIT-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Siwek updated BIT-1217: --- Resolution: Fixed Status: Closed (was: Open) Documentation: include type for vectors --- Key: BIT-1217 URL: https://bro-tracker.atlassian.net/browse/BIT-1217 Project: Bro Issue Tracker Issue Type: Problem Components: Bro, Website Affects Versions: git/master Reporter: Johanna Amann Fix For: 2.4 While browsing our documentation, I noticed that at the moment the script reference does not contain the type that is stored inside of a vector. This would be highly convenient sometimes. At the moment, it is e.g. impossible to find out what kind of Data a vector in an Info record contains. See http://www.bro.org/sphinx-git/scripts/base/protocols/ssl/main.bro.html#type-SSL::Info for an example/ -- This message was sent by Atlassian JIRA (v6.3-OD-08-005-WN#6328) ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev