Re: Change field separator in Metron to make it Hive and ORC friendly

2018-08-15 Thread Ali Nazemian
Hi Simon,

I think it is a hard trade-off. Even right now without any ability to
customise separator/Metron internal field names, Metron users need to put a
mapping in place at the integration layer (At least this is what we are
doing :) ). Every organisation/user may need to follow different policies
for different reasons, not to mention any certain technology limitations
(e.g. hive). The question is, do we think Elasticsearch/Solr and HDFS (As
data storage) are coupled with Metron or not. Metron components can freely
use metron specific data model, but when it comes to the data model at
rest, it would be better to decouple it from Metron data model to make it
more flexible for the integration with other tools, so it means whenever
data model is related to rest, a mapping layer would be required.
Certainly, it doesn't mean every Metron user should provide a mapping. We
can, but it doesn't mean we have to. It becomes just more flexible for the
integration to be able to have a consistent data model across integration
endpoints (Elasticsearch/Solr and HDFS). The problem we are facing is in
addition to a separate mapping for Elasticsearch, we have to put a
different mapping for ORC as well. At least if it was consistent across
Elasticsearch and HDFS, we could only have a single mapping for an
application that consumes from both. Therefore, if we exclude the data
model in transit, A mapping at Metron-rest (to serve Alert UI) and a
mapping at Metron-indexing (ES/Solr and HDFS) would be sufficient. Even
right now by changing the separator at the index time we are doing the same
thing. We are not changing the data model in transit.

Cheers,
Ali



On Tue, Aug 14, 2018 at 9:11 PM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> The challenge with making it configurable is that every query, every
> profile, every analytic, template, pre-installed dashboard and use case
> built by any third party who wanted to extend metron would have to honour
> the configuration and paramaterize every query they run. My worry is that
> that would render some engines totally incompatible with many installs (as
> opposed to just needing an escape character as you would with hive now) and
> would prevent a lot of tools participating in the metron eco-system.
>
> I think this is something where we need to make a good decision and stick
> to it to allow the ecosystem to build on a known foundation.
>
> Dots are not great because hive uses them to separate, underscore collides
> with our existing  convention, and hyphen collides with a number of other
> common log formats, so it’s not an easy one to have an opinion on, but I do
> think we should have an opinion rather than forcing every user to make the
> hard choice to exclude others from sharing.
>
> Perhaps the flat key value structure is the real question here, and given
> progress in the underlying index engines may not be the panacea it once was.
>
> Simon
>
> Sent from my iPhone
>
> > On 14 Aug 2018, at 11:42, deepak kumar  wrote:
> >
> > I agree Ali.
> > May be it can be configuration parameter.
> >
> >> On Tue, Aug 14, 2018 at 3:e t24 PM Ali Nazemian 
> wrote:
> >>
> >> Hi Simon,
> >>
> >> We have temporarily decided to just change it with "_" for HDFS to avoid
> >> all the headaches of the bugs and issues that can be raised by using
> >> unsupported separators for ORC/Hive and Spark. However, I am not quite
> >> confident with "_" as an option for the community as it becomes similar
> to
> >> normal Metron separator. Maybe it would be nice to have an ability to
> >> change the separator to any other character and let users decide what
> they
> >> want to use.
> >>
> >> Cheers,
> >> Ali
> >>
> >> On Tue, Aug 14, 2018 at 12:14 AM Simon Elliston Ball <
> >> si...@simonellistonball.com> wrote:
> >>
> >>> Do you have any suggestions for what would make sense as a delimiter?
> >>>
>  On 9 August 2018 at 05:57, Ali Nazemian 
> wrote:
> 
>  Hi All,
> 
>  I was wondering if we can change the field separators in Metron to be
> >>> able
>  to make it Hive/ORC friendly. I could find the following PR, but
> >> neither
>  dot nor colon is very Hive and ORC friendly and they will cause some
>  issues. Hence, I wanted to see if it is possible to change the field
>  separator to something else or even give users an ability to define
> >> what
>  separator to be used to make the data model consistent across
> >>> Elasticsearch
>  and HDFS.
> 
>  https://github.com/apache/metron/pull/1022
> 
>  Cheers,
>  Ali
> 
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>> simon elliston ball
> >>> @sireb
> >>>
> >>
> >>
> >> --
> >> A.Nazemian
> >>
>


-- 
A.Nazemian


Re: Change field separator in Metron to make it Hive and ORC friendly

2018-08-14 Thread Simon Elliston Ball
The challenge with making it configurable is that every query, every profile, 
every analytic, template, pre-installed dashboard and use case built by any 
third party who wanted to extend metron would have to honour the configuration 
and paramaterize every query they run. My worry is that that would render some 
engines totally incompatible with many installs (as opposed to just needing an 
escape character as you would with hive now) and would prevent a lot of tools 
participating in the metron eco-system.

I think this is something where we need to make a good decision and stick to it 
to allow the ecosystem to build on a known foundation. 

Dots are not great because hive uses them to separate, underscore collides with 
our existing  convention, and hyphen collides with a number of other common log 
formats, so it’s not an easy one to have an opinion on, but I do think we 
should have an opinion rather than forcing every user to make the hard choice 
to exclude others from sharing. 

Perhaps the flat key value structure is the real question here, and given 
progress in the underlying index engines may not be the panacea it once was.

Simon

Sent from my iPhone

> On 14 Aug 2018, at 11:42, deepak kumar  wrote:
> 
> I agree Ali.
> May be it can be configuration parameter.
> 
>> On Tue, Aug 14, 2018 at 3:e t24 PM Ali Nazemian  
>> wrote:
>> 
>> Hi Simon,
>> 
>> We have temporarily decided to just change it with "_" for HDFS to avoid
>> all the headaches of the bugs and issues that can be raised by using
>> unsupported separators for ORC/Hive and Spark. However, I am not quite
>> confident with "_" as an option for the community as it becomes similar to
>> normal Metron separator. Maybe it would be nice to have an ability to
>> change the separator to any other character and let users decide what they
>> want to use.
>> 
>> Cheers,
>> Ali
>> 
>> On Tue, Aug 14, 2018 at 12:14 AM Simon Elliston Ball <
>> si...@simonellistonball.com> wrote:
>> 
>>> Do you have any suggestions for what would make sense as a delimiter?
>>> 
 On 9 August 2018 at 05:57, Ali Nazemian  wrote:
 
 Hi All,
 
 I was wondering if we can change the field separators in Metron to be
>>> able
 to make it Hive/ORC friendly. I could find the following PR, but
>> neither
 dot nor colon is very Hive and ORC friendly and they will cause some
 issues. Hence, I wanted to see if it is possible to change the field
 separator to something else or even give users an ability to define
>> what
 separator to be used to make the data model consistent across
>>> Elasticsearch
 and HDFS.
 
 https://github.com/apache/metron/pull/1022
 
 Cheers,
 Ali
 
>>> 
>>> 
>>> 
>>> --
>>> --
>>> simon elliston ball
>>> @sireb
>>> 
>> 
>> 
>> --
>> A.Nazemian
>> 


Re: Change field separator in Metron to make it Hive and ORC friendly

2018-08-14 Thread deepak kumar
I agree Ali.
May be it can be configuration parameter.

On Tue, Aug 14, 2018 at 3:24 PM Ali Nazemian  wrote:

> Hi Simon,
>
> We have temporarily decided to just change it with "_" for HDFS to avoid
> all the headaches of the bugs and issues that can be raised by using
> unsupported separators for ORC/Hive and Spark. However, I am not quite
> confident with "_" as an option for the community as it becomes similar to
> normal Metron separator. Maybe it would be nice to have an ability to
> change the separator to any other character and let users decide what they
> want to use.
>
> Cheers,
> Ali
>
> On Tue, Aug 14, 2018 at 12:14 AM Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
> > Do you have any suggestions for what would make sense as a delimiter?
> >
> > On 9 August 2018 at 05:57, Ali Nazemian  wrote:
> >
> > > Hi All,
> > >
> > > I was wondering if we can change the field separators in Metron to be
> > able
> > > to make it Hive/ORC friendly. I could find the following PR, but
> neither
> > > dot nor colon is very Hive and ORC friendly and they will cause some
> > > issues. Hence, I wanted to see if it is possible to change the field
> > > separator to something else or even give users an ability to define
> what
> > > separator to be used to make the data model consistent across
> > Elasticsearch
> > > and HDFS.
> > >
> > > https://github.com/apache/metron/pull/1022
> > >
> > > Cheers,
> > > Ali
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
>
>
> --
> A.Nazemian
>


Re: Change field separator in Metron to make it Hive and ORC friendly

2018-08-14 Thread Ali Nazemian
Hi Simon,

We have temporarily decided to just change it with "_" for HDFS to avoid
all the headaches of the bugs and issues that can be raised by using
unsupported separators for ORC/Hive and Spark. However, I am not quite
confident with "_" as an option for the community as it becomes similar to
normal Metron separator. Maybe it would be nice to have an ability to
change the separator to any other character and let users decide what they
want to use.

Cheers,
Ali

On Tue, Aug 14, 2018 at 12:14 AM Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Do you have any suggestions for what would make sense as a delimiter?
>
> On 9 August 2018 at 05:57, Ali Nazemian  wrote:
>
> > Hi All,
> >
> > I was wondering if we can change the field separators in Metron to be
> able
> > to make it Hive/ORC friendly. I could find the following PR, but neither
> > dot nor colon is very Hive and ORC friendly and they will cause some
> > issues. Hence, I wanted to see if it is possible to change the field
> > separator to something else or even give users an ability to define what
> > separator to be used to make the data model consistent across
> Elasticsearch
> > and HDFS.
> >
> > https://github.com/apache/metron/pull/1022
> >
> > Cheers,
> > Ali
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>


-- 
A.Nazemian


Re: Change field separator in Metron to make it Hive and ORC friendly

2018-08-13 Thread Simon Elliston Ball
Do you have any suggestions for what would make sense as a delimiter?

On 9 August 2018 at 05:57, Ali Nazemian  wrote:

> Hi All,
>
> I was wondering if we can change the field separators in Metron to be able
> to make it Hive/ORC friendly. I could find the following PR, but neither
> dot nor colon is very Hive and ORC friendly and they will cause some
> issues. Hence, I wanted to see if it is possible to change the field
> separator to something else or even give users an ability to define what
> separator to be used to make the data model consistent across Elasticsearch
> and HDFS.
>
> https://github.com/apache/metron/pull/1022
>
> Cheers,
> Ali
>



-- 
--
simon elliston ball
@sireb


Change field separator in Metron to make it Hive and ORC friendly

2018-08-08 Thread Ali Nazemian
Hi All,

I was wondering if we can change the field separators in Metron to be able
to make it Hive/ORC friendly. I could find the following PR, but neither
dot nor colon is very Hive and ORC friendly and they will cause some
issues. Hence, I wanted to see if it is possible to change the field
separator to something else or even give users an ability to define what
separator to be used to make the data model consistent across Elasticsearch
and HDFS.

https://github.com/apache/metron/pull/1022

Cheers,
Ali