2013/6/18 Mahesh V <[email protected]>

> Hi,
>
> performance seems to be good with elastic search
>  2 minutes and change for 5 lakh entries
>

Glad to hear that it's good :) I guess there's a typo there so I don't get
your statement completely.


>
> However, i have a problem.
> I can enter the messages into elasticsearch only when rsyslogd is running
> in foreground   (rsyslogd -n running in command line)
> not sure why this is so.
>

Hmm, I've got a similar issue:
http://www.gossamer-threads.com/lists/rsyslog/users/9463#9463

The only solution I found so far was to move the init script (ie: mv
/etc/init.d/rsyslog /etc/init.d/rsyslog-new). I'm not sure if your issue is
the same, but you can check by changing the init script to do only "rsyslog
-dn > /tmp/logfile &". Then start it and check the log.


>
> Secondly, my requirement would be to query based on part of message.
> i.e my message may look like
> <date><time>ip=x.x.x.x name=abcd loglevel=3 <actual log message>
> Is it possible to query using curl alll messages that have ip address as
> y.y.y.y ?
>

Ah, that's a problem, because the standard
analyzer<http://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer/>would
break your log into "words" (terms). And each number in an IP would
be a term.

The same analyzer would be applied by default when you search (each number
from the IP is a term, and ES will look for any of the numbers in your
logs).

There are some things you can do:
- the query you run with "q=msg:127.0.0.1" for example, does a string
query<http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query/>.
If you specify your query in JSON, you can change the default operator from
OR to AND. Like:

curl localhost:9200/_search?pretty -d '{
  "query": {
    "query_string": {
      "query": "127.0.0.1",
      "default_operator": "AND"
    }
  }
}'

This would match only the logs that contain 127, 0 and 1.

If you need more precise matches, the best way to go is to parse your logs
and insert each field in it's own field. Like:
{"ip":"x.x.x.x", "name":"abcd"}

This way you can search in the ip field directly. You can also search in
all fields by using the default "_all" field. That said, to match only the
exact IP, you need to make sure ES doesn't analyze your "ip" field to brake
it into terms. You'd do that by setting "index" to "not_analyzed" in your
mapping <http://www.elasticsearch.org/guide/reference/mapping/>. For
example:

curl -XPUT 'http://localhost:9200/system/events/_mapping' -d '{
    "events" : {
        "properties" : {
            "ip" : {"type" : "string", "index" : "not_analyzed"}
        }
    }
}'

You need to put your
mapping<http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/>before
you start indexing that field, otherwise it will get detected
automatically.


>
> Is 4000 (500000 entries / 125 odd seconds) the max I can get per second in
> my system or can I get some more tuning parameters.
>

You can definitely get more. I've sent you some links a while ago, I'll
paste them here again:
http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html
http://www.elasticsearch.org/tutorials/using-elasticsearch-for-logs/
--> you can ignore the "compression" advice from there, it's compressed by
default in 0.90+
http://wiki.rsyslog.com/index.php/Queues_on_v6_with_omelasticsearch

The first thing you can do is to enable bulks by adding bulkmode="on" to
your action line. I see you already set ActionQueueDequeueBatchSize. Not
sure if it works, I usually set it as queue.dequeuebatchsize="1000" in the
action line.


>
> My rsyslog.conf has the following lines
> ------------------------------------------------------------
> $ActionQueueDequeueBatchSize  1000
>
> template (name="apsimTemplate" type="list" option.json="on") {
>   constant(value="{")
>   constant(value="\"@message\":\"")
>   property(name="msg")
>   constant(value="\"}")
> }
>
> *.*   action(type="omelasticsearch" template="apsimTemplate"
> server="localhost" serverport="9200")
>
>
> My elasticsearch.yml has folliwing lines
> ------------------------------------------------------------
> cluster:
>    name:   APSIM
>
> network:
>    host:   localhost
>
> root@localhost rsyslog]# date; ./a.out ; date
> Mon Jun 17 23:33:29 IST 2013
> openlog: Success
> Mon Jun 17 23:35:32 IST 2013
>
> [root@localhost rsyslog]#
> [root@localhost rsyslog]# curl 'http://localhost:9200/_search?pretty=true'
> -d '
> {
>     "from" : 0, "size" : 1000000,
>     "query" : {
>         "matchAll" : {}
>     }
> }'  > e
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
> 100 1688k  100 1688k    0    86  19.1M    998 --:--:-- --:--:-- --:--:--
> 19.6M
>
>
> [root@localhost rsyslog]# cat e | grep "this is a test" | wc -l
> 500000
>

Note that you can simply go like:
curl localhost:9200/_search?size=0

And watch the hits.total field for the number of hits. If you want to get
serious with the performance test it would be expensive to fetch all your
docs.

Best regards,
Radu
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to