Hello Rainer, I've been waiting for this :)

I'll reply inline.

On Mon, Sep 1, 2014 at 1:44 PM, Rainer Gerhards <[email protected]>
wrote:
>
> Questions now:
>
> - what should I install to get an ES testenv (possibly something that
> doesn't need babysitting or manual updates in the next year or two).
>

Two options:
1. Install ES locally. It's mostly about getting the DEB
<http://www.elasticsearch.org/download/> and doing `dpkg -i
elasticsearch*.deb`. Besides that, you need Java. The not-so-open Oracle
version
<http://ubuntuhandbook.org/index.php/2014/02/install-oracle-java-6-7-or-8-ubuntu-14-04/>
is typically recommended. This requires pulling another DEB in the next
year or two if major changes occur in the API (not super-likely, but who
knows)
2. Use Logsene <http://sematext.com/logsene/index.html>, which exposes the
ES API. The free plan will do, as it's good for 1M documents. You can remove
logs manually
<https://sematext.atlassian.net/wiki/display/PUBLOGSENE/Logsene+FAQ#LogseneFAQ-Deletinglogs>,
and you can adjust the number of days they're kept through the UI. We'll
take care of ES upgrades for you, as we typically release ES often and
typically upgrade ES on the way. If you have any questions about Logsene,
you know where to find me :D


>
> To run a test I need to do via a script
>
> - clean up any previous work done on an ES index
>

curl -XDELETE localhost:9200/_all   # this wipes everything
curl -XDELETE localhost:9200/my_index    # this removes an index


> - be able to "export" a whole ES index into a text file
>

This depends on how big your index is and how you want the text file to be
formatted. Let's assume you're good with a big pretty-formatted JSON for
now.

If the index is small (say 100 or 1000 docs), this will do:

curl localhost:9200/my_index/_search?pretty > /tmp/destination_file

If the index is big, you need to scan and scroll
<http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html>.
For example, this scroll ID expires in 1m. The timer is reset each time you
get the scroll
curl 'localhost:9200/my_index/_search?search_type=scan&scroll=1m'

You'll get back a scroll ID. You should put it in a variable like:
SCROLL_ID=curl -s
'localhost:9200/_search?search_type=scan&scroll=1m&pretty' | grep scroll_id
| cut -d '"' -f 4

# then, to get a batch of results, you scroll:
curl 'localhost:9200/_search/scroll?scroll=1m&pretty' -d $SCROLL_ID

But each time you do that, you need a new scroll ID. That new scroll ID
needs to be used for the next fetch and so on, until you have no more hits
in the hits array. A bit complicated in shell, may have to use Python or
something like that. There are ready-made scripts that may do that for you,
like this one:
https://github.com/miku/estab


>
> How is this done automatically via shell script?
>

I hope I've answered this question above.


>
> If I got those pieces together, I think I can add a basic test. Once this
> is done, we may be able to do more (especially checking for error cases),
> but let's first get the basics going.
>
> Any help is appreciated.
>

I did some omelasticsearch tests a while ago. Though I can't find them
right now. I think they provide a good base. Should I dig deeper into
Emails and githubs or do you already know where they are?

Best regards,
Radu
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to