--
View this message in context:
http://old.nabble.com/How-do-I-block-ban-a-specific-domain-name-or-a-tld--tp26289091p26306461.html
Sent from the Nutch - User mailing list archive at Nabble.com.
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source)
email: s
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source
Sorry...
The regular expressions should be:
-^http://( http://%28/[a-z0-9]*\.)*who.int/
Had made an error in the previous email. Wonder whether gmail is playing
with the characters in the set emails...
-sroy
On Wed, Nov 25, 2009 at 12:00 PM, Subhojit Roy mails...@gmail.com wrote:
Try:
1
]*\.)*website.com/unknown-folder/known-folder/
first folder can vary, whereas host name and second folder are known.
how can i substitute unknown parts (folders) of the url?
any help appreciated!
regards
mailusenet
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source
, in my older version of nutch, same merge works with the
default java heap max setting of only 1G.
Dose anybody have the same experience? Is there any work around this?
Thanks
Kevin Chen
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source)
email: s...@profound.in
://old.nabble.com/file/p26197881/urllist.txt urllist.txt
--
View this message in context:
http://old.nabble.com/How-to-fetch-URLs-with-special-charaters-%27-%27---%27%3D%27-tp26197881p26197881.html
Sent from the Nutch - User mailing list archive at Nabble.com.
--
Subhojit Roy
Profound
right for the job. It's the part
that I have a hard time evaluating with Nutch. Some of what I have read
from the mailing list suggests it's still not all that easy to do
extraction
with Nutch, am I wrong?
Mark
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source
, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source)
email: s...@profound.in
http://www.profound.in
://old.nabble.com/PRUNE-%3A-need-some-help-on-pruning-syntax.-tp26268447p26268447.html
Sent from the Nutch - User mailing list archive at Nabble.com.
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source)
email: s...@profound.in
http://www.profound.in
?
Thanks,
Regards,
Varish Mulwad
--
Subhojit Roy
Profound Technologies
(Search Solutions based on Open Source)
email: s...@profound.in
http://www.profound.in
10 matches
Mail list logo