Re: Site indexing application

2006-03-24 Thread Gabriel George POPA

Frank Denis wrote:


Le Tue, Mar 21, 2006 at 02:18:10PM +0200, Gabriel George POPA ecrivait :


Frank Denis wrote:



Yes, very interesting. But I was looking for a very secure, highly 
proven solution, prepackaged for OpenBSD with Apache chrooted.



 Well, Hyper Estraier is far from being a beta project. It's an evolution
of Estraier, itself based upon Snatcher, whoose work began 6 years 
ago. The

code is very clean, it works and it's fast.

 The code of Mnogosearch (and DPsearch since it's based upon it) is 
messy and

designed in a totally insecure and unreliable way. I had a hard time last
year with it in order to add various hacks to have it work with our 
blog web

site (skyblog.com). There were many ways to get it die with segmentation
faults. And the indexer wasn't always able to resume its activity after a
crash. Plus Mnogosearch doesn't scale as well as advertised.
 OTOH, Hyper Estraier scales really well.
 
 It just needs an OpenBSD port.


I installed Hyper Estraier but now, because it is in chroot, it cannot 
find the libraries it depends on. I had this problem quite a few times
with different programs. I did not have the time to solve it (with other 
programs too). What do I do: ldconfig? This is the standard method?

ldconfig with what params? Or maybe it's better to set the LD_LIBRARY_PATH?

 
George




Re: Site indexing application

2006-03-24 Thread Gabriel George POPA

Gabriel George POPA wrote:


Frank Denis wrote:


Le Tue, Mar 21, 2006 at 02:18:10PM +0200, Gabriel George POPA ecrivait :


Frank Denis wrote:



Yes, very interesting. But I was looking for a very secure, highly 
proven solution, prepackaged for OpenBSD with Apache chrooted.




 Well, Hyper Estraier is far from being a beta project. It's an 
evolution
of Estraier, itself based upon Snatcher, whoose work began 6 years 
ago. The

code is very clean, it works and it's fast.

 The code of Mnogosearch (and DPsearch since it's based upon it) is 
messy and
designed in a totally insecure and unreliable way. I had a hard time 
last
year with it in order to add various hacks to have it work with our 
blog web

site (skyblog.com). There were many ways to get it die with segmentation
faults. And the indexer wasn't always able to resume its activity 
after a

crash. Plus Mnogosearch doesn't scale as well as advertised.
 OTOH, Hyper Estraier scales really well.
 
 It just needs an OpenBSD port.


I installed Hyper Estraier but now, because it is in chroot, it cannot 
find the libraries it depends on. I had this problem quite a few times
with different programs. I did not have the time to solve it (with 
other programs too). What do I do: ldconfig? This is the standard method?
ldconfig with what params? Or maybe it's better to set the 
LD_LIBRARY_PATH?


 
George



Oh, well, I discovered how to solve this problem:
# ldconfig 
-r
// (we notice that libraries that are used by Hyper Estrayer are not here)

# ldconfig /usr/lib /usr/local/lib /usr/X11R6/lib
# mkdir -p /var/www/var/run
# chown -R 0755 /var/www/var
# cp -Rp /var/run/ld.so.hints /var/www/var/run

That's all. Then in a browser:
http://site-name/cgi-bin/estseek.cgi

Finally, I leart how to deal with this chrooted Apache.

  
Yours in BSDness,
   
George




Re: Copying stuff into chroot (was: Site indexing application)

2006-03-24 Thread Joachim Schipper
On Fri, Mar 24, 2006 at 11:06:04AM +0200, Gabriel George POPA wrote:
 Frank Denis wrote:

 I installed Hyper Estraier but now, because it is in chroot, it cannot 
 find the libraries it depends on. I had this problem quite a few times
 with different programs. I did not have the time to solve it (with other 
 programs too). What do I do: ldconfig? This is the standard method?
 ldconfig with what params? Or maybe it's better to set the LD_LIBRARY_PATH?

I use something like the following for copying stuff into chroot. Note:
this works for me, but might not do in certain corner cases.

Glue aside, use ldd to figure out which libraries are needed and copy
those.

This was just a quick hack. Being a shell script, it is also quite
inefficient - mostly due to the fact it starts lots of programs. I might
one day create a Perl implementation, which would be much faster.

Any comments are welcome, as always.

One noteworthy thing is that it attemps to synchronize directories -
notably, it will delete anything from the destination directory not
found in the source directory. One other noteworthy thing is that it
Does not clear old libraries.

## BEGIN ##
#!/bin/sh

# Syntax:
#   cpchroot file1 [file2 [file3 ...]]
#
# Copies all files, which should be given as a fully qualified path, into the
# corresponding directory relative to the current directory.
#
# As a special case, when the file being copied is a dynamically linked
# executable, also copy any libraries it depends on.
#
# Any directories required are created.
#
# When a directory is given as an argument, cpchroot is applied to all files in
# the directory, and the directory is then searched for any files that are not
# in the original

umask 022

LIBS=
ERROR=0
TMP1=`mktemp` || exit 1
TMP2=`mktemp` || exit 1

smartcp() {
RELATIVE_BASE=`dirname $1 | sed -e 's/^\///'`
if ! [ -e $RELATIVE_BASE ]; then
install -d $RELATIVE_BASE || ERROR=1;
fi
if [ ! -e .$1 -o \( -f $1 -o -h $1 \) -a .$1 -ot $1 ]; then
echo cp $1 `pwd`$1;
cp $1 .$1 || ERROR=1;
fi
}

exit_and_clean() {
rm $TMP1 $TMP2
exit $1
}

if [ $# -eq 0 ]; then
echo $0 cannot be called with zero arguments 2;
echo Syntax: $0 file1 [file2 [file3 ...]] 2;
exit_and_clean 127
fi

echo Don't run just any script off the internet! 2;
if [ `id -u` -eq 0 ]; then
echo AND ESPECIALLY NOT AS ROOT! 2;
fi
exit 127

for i in $@; do
if ! [ -e $i ]; then
echo File $i not found 2;
exit_and_clean 2;
fi
if ! echo $i | grep '^\/' /dev/null; then
echo File $i not given as absolute path 2;
exit_and_clean 2;
fi
done

# Okay, our input is sane. Now let's get to it.
for i in $@; do
if [ -d $i ]; then
# Recursively descend into the directory
find $i ! -type d -print0 | xargs -0 $0;
# Remove any fluff
find $i | sed -e 's/^/./' | sort  $TMP1
find .$i | sort  $TMP2
if ! cmp $TMP1 $TMP2 /dev/null; then
for i in `diff -u $TMP1 $TMP2 | sed -ne '1,2d' -e 
's/^+//p'`; do
echo rm -rf $i | sed -e s/\.\//`pwd`/;
rm -rf $i;
done
fi
else
if file $i 2/dev/null | \
  grep 'ELF.*executable.*dynamically linked' /dev/null; then
LIBS=$LIBS `ldd \$i\ | sed -e '1,3d' -e 's/.* //' 
2/dev/null`;
fi
smartcp $i
fi
done

if [ x$LIBS != x ]; then
LIBS=`echo $LIBS | sort | uniq`
for i in $LIBS; do
smartcp $i
done
fi

exit_and_clean $ERROR
## EOF ##

As to the legalese: this script is hereby placed into the public domain,
so feel free to do as you please with it. I'd strongly suggest not
altering certain features when publishing it on the internet, though.

Joachim



Re: Site indexing application

2006-03-24 Thread Bryan Irvine
On 3/21/06, Gabriel George POPA [EMAIL PROTECTED] wrote:
 Hello misc,

I must install a search facility for my site. Do you know what is the
 most appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from
  Apache.org) before on my old Slackware 10.1 machine and I didn't like
 it very much (a lot of things to be done by hand). I'm asking that
  because I know the chroot(2) facility that Apache has on OpenBSD can
 cause a lot of trouble.

ht://Dig works well for a quick'n'dirty solution.  For something more
in-depth have a look at lucene, (I think it's been taken over by
apache now as well).

--Bryan



Re: Site indexing application

2006-03-24 Thread Karsten McMinn
On 3/24/06, Gabriel George POPA [EMAIL PROTECTED] wrote:

 Frank Denis wrote:
 I installed Hyper Estraier but now, because it is in chroot, it cannot
 find the libraries it depends on. I had this problem quite a few times
 with different programs. I did not have the time to solve it (with other
 programs too). What do I do: ldconfig? This is the standard method?
 ldconfig with what params? Or maybe it's better to set the
 LD_LIBRARY_PATH?


ldd normally will  take care of most of those types of issues.

swish-e isn't bad for search/indexing.



Re: Site indexing application

2006-03-22 Thread Frank Denis

Le Tue, Mar 21, 2006 at 02:18:10PM +0200, Gabriel George POPA ecrivait :

Frank Denis wrote:


Yes, very interesting. But I was looking for a very secure, highly 
proven solution, prepackaged for OpenBSD with Apache chrooted.


 Well, Hyper Estraier is far from being a beta project. It's an evolution
of Estraier, itself based upon Snatcher, whoose work began 6 years ago. The
code is very clean, it works and it's fast.

 The code of Mnogosearch (and DPsearch since it's based upon it) is messy and
designed in a totally insecure and unreliable way. I had a hard time last
year with it in order to add various hacks to have it work with our blog web
site (skyblog.com). There were many ways to get it die with segmentation
faults. And the indexer wasn't always able to resume its activity after a
crash. Plus Mnogosearch doesn't scale as well as advertised.
 OTOH, Hyper Estraier scales really well.
 
 It just needs an OpenBSD port.


--
Frank Denis - frank [at] nailbox.fr
Young Nails / Akzentz nail tech



Site indexing application

2006-03-21 Thread Gabriel George POPA

   Hello misc,

  I must install a search facility for my site. Do you know what is the 
most appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from
Apache.org) before on my old Slackware 10.1 machine and I didn't like 
it very much (a lot of things to be done by hand). I'm asking that
because I know the chroot(2) facility that Apache has on OpenBSD can 
cause a lot of trouble.




George Popa




Re: Site indexing application

2006-03-21 Thread Frank Denis

Le Tue, Mar 21, 2006 at 02:03:27PM +0200, Gabriel George POPA ecrivait :

  I must install a search facility for my site.


 Have a look at Hyper Estraier : http://hyperestraier.sourceforge.net/
 
 It works amazingly well.


--
Frank Denis - frank [at] nailbox.fr
Young Nails / Akzentz nail tech
http://www.manucure.info



Re: Site indexing application

2006-03-21 Thread March, Harold W.
mnoGoSearch: http://www.mnogosearch.org/

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of
Gabriel George POPA
Sent: Tuesday, March 21, 2006 7:03 AM
To: misc@openbsd.org
Subject: Site indexing application


Hello misc,

   I must install a search facility for my site. Do you know what is the 
most appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from
 Apache.org) before on my old Slackware 10.1 machine and I didn't like 
it very much (a lot of things to be done by hand). I'm asking that
 because I know the chroot(2) facility that Apache has on OpenBSD can 
cause a lot of trouble.



 
George Popa



Re: Site indexing application

2006-03-21 Thread Jeff Ross

On Tue, 21 Mar 2006, Gabriel George POPA wrote:


   Hello misc,

 I must install a search facility for my site. Do you know what is the most 
appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from
Apache.org) before on my old Slackware 10.1 machine and I didn't like it very 
much (a lot of things to be done by hand). I'm asking that
because I know the chroot(2) facility that Apache has on OpenBSD can cause a 
lot of trouble.



   George 
Popa





I installed dpsearch from http://www.dataparksearch.org.  You can see it 
in action on http://www.wykids.org.


It isn't any trouble at all to get working in the chroot.  My config file 
was:


./configure \
  --prefix=/dpsearch \
  --with-pgsql \
  --with-openssl \
  --with-zlib \
  --without-docs \
  --without-aspell \
  --enable-all-static


This will install everything into /dpsearch, you can then make a 
/var/www/dpsearch and copy everything across.  Documentation isn't up to 
OpenBSD standards, but that's a pretty high bar ;-)  Still, I was able to 
get it running with minimum fuss.


I've been contemplating making a port, but haven't yet looked into what 
all is involved.


Hope that helps!

Jeff