Re: [GENERAL] grep -f keyword data query

Hiroyuki Sato Tue, 29 Dec 2015 07:22:32 -0800

Hello Jeff

Thank you for replying.


2015年12月29日(火) 4:35 Jeff Janes <jeff.ja...@gmail.com>:

> On Sun, Dec 27, 2015 at 3:53 PM, Hiroyuki Sato <hiroys...@gmail.com>
> wrote:
> > Hello Andreas and Tom
> >
> > Thank you for replying.
> >
> > Sorry, I re-created my questions. I was mis-pasted query log on previous
> > question.
> > (@~ operator is PGroonga extension (http://pgroonga.github.io))
> > Please ignore it.
> >
> > Best regards.
> >
> > 1, Problem.
> >   (1) Following query is exteme slow. (478sec)
> >     SELECT
> >       u.url
> >     FROM
> >       url_lists4 u,
> >       keywords4 k
> >     WHERE
> >       u.url like k.url
> >     AND
> >       k.name = 'esc_url';
> >
> >
> >   (2) grep -f kwd.txt sample.txt (exec time under 1sec)
>
> Certainly not in my hands.  The best I can get is 9 seconds.
>
> >
> > 2, Questions
> >
> >   (1) Is it possible to improve this query like the command ``grep -f
> > keyword data``?
>
> You will not get a general tool to match a specialized tool in the
> specialized tool's own domain.  fgrep is good at what fgrep does.
>

This is just same data and real data is forward proxy log.
There are no own domain. It is contain 20,000,000 log per day.


>
>
> Since your queries all have constant text strings at the beginning,
> they could use the index.  But if you are not using the C collation,
> then you need build a special index:
>
> create index on url_lists4 (url text_pattern_ops);
>

Currently I just use C collation (ASCII) only.


>
> But, the planner refuses to use this index for your query anyway,
> because it can't see that the patterns are all left-anchored.
>
> Really, your best bet is refactor your url data so it is stored with a
> url_prefix and url_suffix column.  Then you can do exact matching
> rather than pattern matching.
>

I see, exact matching faster than pattern matting.
But I need pattern match in path part
(ie, http://www.yahoo.com/a/b/c/... )
 I would like to pattern match '/a/b/c' part.

That's why I asked this question.

If it is impossible to improve join speed, I will dump data once, and
match it with grep or something tools.

Thanks.



>
> Cheers,
>
> Jeff
>

Re: [GENERAL] grep -f keyword data query

Reply via email to