Hi Michael,
If you are using 1.14, there is a parameter -sample that allows you to request
a random sample. See https://issues.apache.org/jira/browse/NUTCH-2463.
Yossi.
> -Original Message-
> From: Michael Coffey
> Sent: 01 May 2018 23:47
> To: User
3:18
> To: user@nutch.apache.org
> Subject: Re: RE: random sampling of crawlDb urls
>
> Just to clarify: .99 does NOT work fine. It should have rejected most of the
> records when I specified "((Math.random())>=.99)".
>
> I have used expressions not involving M
Just to clarify: .99 does NOT work fine. It should have rejected most of the
records when I specified "((Math.random())>=.99)".
I have used expressions not involving Math.random. For example, I can extract
records above a specific score with "score>1.0". But the random thing doesn't
work even
Hello Michael,
I would think this should work as well. But since you mention .99 works fine,
did you try .1 as well to get ~10% output? It seems the expressions itself do
work at some level, and since this is a Jexl specific thing, you might want to
try the Jexl list as well. I could not find
4 matches
Mail list logo