Hi Peter,
An interesting concept ...
This may sound simplistic, but is it viable for your application to initially
have a process where new queries written are vetted by a human before they are
run ? The advantage of this is two-fold, namely:
i) you'll be able to move on a prove your concept quickly
ii) while doing this, you may learn enough (and things may change
enough) for you to automate the vetting process
Thanks,
Justin
-----Original Message-----
From: Peter Grman [mailto:[email protected]]
Sent: 28 November 2014 02:12
To: [email protected]
Subject: Re: Allow user-defined views
No, I don't. The program should be for analysing logs (collected by
fluentd) - should be open source and on github, however there isn't much done
yet: https://github.com/logTank/
The index rebuilding shouldn't be a problem as CouchDB will be only used for
general stats and the user actually won't see the up to date data, but always
with a delay - another advantage of CouchDB, I can read the queries without
bothering the system, and once the data is outdated, I can update the index. At
least so far the theory, I'll need to run some performance tests if that
actually works, once I'll have a MVP. The other option is to use MongoDB for
ad-hoc queries, but I was thinking that CouchDB will be more efficient as
storage is so cheap.
As I'm learning every time I look up info about CouchDB something new, and
something becomes more clear, I'm also glad about feedback on the idea in
general, how I want to use CouchDB.
However I'd be also very happy if I could somehow solve the problem with the
possible DoS attacks :). Maybe there is something in CouchDB or evalcx which I
can configure - maximal runtime of a map/reduce function?
(shouldn't be more than 1ms). Or there are some logged data by CouchDB about
the resources required by views (CPU Time + HDD Space)?
Cheers
Peter
On Fri Nov 28 2014 at 2:54:51 AM Alexander Gabriel <[email protected]> wrote:
> sorry for being off-topic
> Alex
>
>
> 2014-11-28 2:52 GMT+01:00 Alexander Gabriel <[email protected]>:
>
> > sounds like a very interesting application
> >
> > seems like you dont care if the user has to wait for an index to be
> > built when the user creates a query
> >
> > Alex
> >
> >
> > 2014-11-28 2:23 GMT+01:00 Peter Grman <[email protected]>:
> >
> >> Hi Alex,
> >>
> >> Yes, the users would be able to import different sets of data,
> >> which
> isn't
> >> relational, and use the platform to analyse it. The analysed data
> >> would
> be
> >> in 99% of the cases append only (+ removing old data) and the data
> >> can
> be
> >> defined by the user, as well as be hierarchical.
> >>
> >> When I thought about the system in the beginning, CouchDB seemed
> >> like an awesome choice as there would be only a couple of well
> >> defined queries
> and
> >> storage is generally cheap, I thought that CouchDB views and their
> caching
> >> are what I'm looking for.
> >>
> >> The problem is again only with people who want to trick the system.
> >> I would be also happy with a solution which would detect bad views
> >> ones they
> have
> >> been deployed (uses too much space, takes too long to compute) and
> >> deactivates and marks them for me to check. This way I could check
> >> those few people who try a DoS attack and ban them from the service.
> >>
> >> The additional main problem was, if it is really impossible to get
> >> data from a different database inside the view and if the user
> >> won't be able
> to
> >> access the underlying system, ..., or if it is just very difficult
> >> => possible, if someone wants to do it they'll find a way. But
> >> after
> reading
> >> more and understanding more, how the views are executed using
> >> evalcx I think the other problems aren't a big concern for me
> >> anymore, is that correct?.
> >>
> >> Although I've found in the code "if possible, use evalcx (not
> >> always available)" - how can I check that evalcx is available on my
> >> system? Or
> is
> >> it just a note for older distributions, nothing to be concerned
> >> about anymore?
> >>
> >> Thank you
> >>
> >> Cheers
> >> Peter
> >>
> >> On Fri Nov 28 2014 at 1:37:57 AM Alexander Gabriel
> >> <[email protected]>
> >> wrote:
> >>
> >> > Hi Peter
> >> >
> >> > Will the users create their own datastructures too?
> >> > If not this sounds like sql on relational tables might be a
> >> > better
> tool
> >> for
> >> > the problem.
> >> > It seems to me you're hitting exactly the weak point of most
> >> > nosql solutions.
> >> >
> >> > Alex
> >> >
> >> >
> >> > 2014-11-28 0:49 GMT+01:00 Peter Grman <[email protected]>:
> >> >
> >> > > Hi,
> >> > >
> >> > > this might sound like a terrible idea to someone who knows
> >> > > CouchDB,
> >> and
> >> > if
> >> > > that's the case, please just take a minute or two, to explain
> >> > > why, otherwise, if the idea isn't so crazy after all, I hope
> >> > > I'll get
> some
> >> > > solutions to my problem:
> >> > >
> >> > > I'm thinking of creating a platform based on CouchDB, where
> >> > > each set
> >> of
> >> > > users (group, customer, ...) would get their own CouchDB
> >> > > Database,
> to
> >> > store
> >> > > and query data. I've heard in a podcast, roughly a year ago,
> >> > > that
> >> this is
> >> > > how CouchDB was meant to be - many smaller databases.
> >> > >
> >> > > To query the data, I want to allow them, to define their own
> >> > > custom queries. Now I could (and want to) create a form which
> >> > > allows to
> >> build a
> >> > > query and translates it to a JS view, but I was thinking about
> >> > > additionally, on top of that, allowing them to define their
> >> > > custom
> >> views
> >> > > directly in JS. They would basically be allowed to define their
> custom
> >> > > Map/Reduce functions.
> >> > >
> >> > > There is a lot which can go wrong with this the worst ones I
> >> > > came up
> >> > with:
> >> > > - DoS attack with endless loops inside the function
> >> > > - DoS attack by emitting too much data (potentially in a loop
> >> > > again)
> >> > >
> >> > > As far as I've understood, it's not possible to access other
> Databases
> >> > from
> >> > > within the view, is this understanding of mine correct?
> >> > >
> >> > > Is it possible to access the filesystem or network services in
> >> > > any
> way
> >> > from
> >> > > the CouchDB view or is the JavaScript engine, which is running
> >> > > the
> >> code,
> >> > > limiting enough?
> >> > >
> >> > > Are there any other things which could go wrong? - or did
> >> > > actually
> >> > somebody
> >> > > already use CouchDB like this, and it's perfectly normal?
> >> > >
> >> > > Is there any way I could prevent the problem with endless loops
> >> > > and
> >> data
> >> > > emitting from happening? - I can run JSLint, which maybe will
> >> > > detect
> >> an
> >> > > endless loop, but that won't help against a loop with a million
> >> > iterations,
> >> > > which will be called for every item inside CouchDB - still
> >> > > quite
> >> endless.
> >> > >
> >> > > Thank you for your help!
> >> > >
> >> > > Cheers,
> >> > > Peter
> >> > >
> >> >
> >>
> >
> >
>