Re: Site indexing application
Frank Denis wrote: Le Tue, Mar 21, 2006 at 02:18:10PM +0200, Gabriel George POPA ecrivait : Frank Denis wrote: Yes, very interesting. But I was looking for a very secure, highly proven solution, prepackaged for OpenBSD with Apache chrooted. Well, Hyper Estraier is far from being a beta project. It's an evolution of Estraier, itself based upon Snatcher, whoose work began 6 years ago. The code is very clean, it works and it's fast. The code of Mnogosearch (and DPsearch since it's based upon it) is messy and designed in a totally insecure and unreliable way. I had a hard time last year with it in order to add various hacks to have it work with our blog web site (skyblog.com). There were many ways to get it die with segmentation faults. And the indexer wasn't always able to resume its activity after a crash. Plus Mnogosearch doesn't scale as well as advertised. OTOH, Hyper Estraier scales really well. It just needs an OpenBSD port. I installed Hyper Estraier but now, because it is in chroot, it cannot find the libraries it depends on. I had this problem quite a few times with different programs. I did not have the time to solve it (with other programs too). What do I do: ldconfig? This is the standard method? ldconfig with what params? Or maybe it's better to set the LD_LIBRARY_PATH? George
Re: Site indexing application
Gabriel George POPA wrote: Frank Denis wrote: Le Tue, Mar 21, 2006 at 02:18:10PM +0200, Gabriel George POPA ecrivait : Frank Denis wrote: Yes, very interesting. But I was looking for a very secure, highly proven solution, prepackaged for OpenBSD with Apache chrooted. Well, Hyper Estraier is far from being a beta project. It's an evolution of Estraier, itself based upon Snatcher, whoose work began 6 years ago. The code is very clean, it works and it's fast. The code of Mnogosearch (and DPsearch since it's based upon it) is messy and designed in a totally insecure and unreliable way. I had a hard time last year with it in order to add various hacks to have it work with our blog web site (skyblog.com). There were many ways to get it die with segmentation faults. And the indexer wasn't always able to resume its activity after a crash. Plus Mnogosearch doesn't scale as well as advertised. OTOH, Hyper Estraier scales really well. It just needs an OpenBSD port. I installed Hyper Estraier but now, because it is in chroot, it cannot find the libraries it depends on. I had this problem quite a few times with different programs. I did not have the time to solve it (with other programs too). What do I do: ldconfig? This is the standard method? ldconfig with what params? Or maybe it's better to set the LD_LIBRARY_PATH? George Oh, well, I discovered how to solve this problem: # ldconfig -r // (we notice that libraries that are used by Hyper Estrayer are not here) # ldconfig /usr/lib /usr/local/lib /usr/X11R6/lib # mkdir -p /var/www/var/run # chown -R 0755 /var/www/var # cp -Rp /var/run/ld.so.hints /var/www/var/run That's all. Then in a browser: http://site-name/cgi-bin/estseek.cgi Finally, I leart how to deal with this chrooted Apache. Yours in BSDness, George
Re: Copying stuff into chroot (was: Site indexing application)
On Fri, Mar 24, 2006 at 11:06:04AM +0200, Gabriel George POPA wrote: Frank Denis wrote: I installed Hyper Estraier but now, because it is in chroot, it cannot find the libraries it depends on. I had this problem quite a few times with different programs. I did not have the time to solve it (with other programs too). What do I do: ldconfig? This is the standard method? ldconfig with what params? Or maybe it's better to set the LD_LIBRARY_PATH? I use something like the following for copying stuff into chroot. Note: this works for me, but might not do in certain corner cases. Glue aside, use ldd to figure out which libraries are needed and copy those. This was just a quick hack. Being a shell script, it is also quite inefficient - mostly due to the fact it starts lots of programs. I might one day create a Perl implementation, which would be much faster. Any comments are welcome, as always. One noteworthy thing is that it attemps to synchronize directories - notably, it will delete anything from the destination directory not found in the source directory. One other noteworthy thing is that it Does not clear old libraries. ## BEGIN ## #!/bin/sh # Syntax: # cpchroot file1 [file2 [file3 ...]] # # Copies all files, which should be given as a fully qualified path, into the # corresponding directory relative to the current directory. # # As a special case, when the file being copied is a dynamically linked # executable, also copy any libraries it depends on. # # Any directories required are created. # # When a directory is given as an argument, cpchroot is applied to all files in # the directory, and the directory is then searched for any files that are not # in the original umask 022 LIBS= ERROR=0 TMP1=`mktemp` || exit 1 TMP2=`mktemp` || exit 1 smartcp() { RELATIVE_BASE=`dirname $1 | sed -e 's/^\///'` if ! [ -e $RELATIVE_BASE ]; then install -d $RELATIVE_BASE || ERROR=1; fi if [ ! -e .$1 -o \( -f $1 -o -h $1 \) -a .$1 -ot $1 ]; then echo cp $1 `pwd`$1; cp $1 .$1 || ERROR=1; fi } exit_and_clean() { rm $TMP1 $TMP2 exit $1 } if [ $# -eq 0 ]; then echo $0 cannot be called with zero arguments 2; echo Syntax: $0 file1 [file2 [file3 ...]] 2; exit_and_clean 127 fi echo Don't run just any script off the internet! 2; if [ `id -u` -eq 0 ]; then echo AND ESPECIALLY NOT AS ROOT! 2; fi exit 127 for i in $@; do if ! [ -e $i ]; then echo File $i not found 2; exit_and_clean 2; fi if ! echo $i | grep '^\/' /dev/null; then echo File $i not given as absolute path 2; exit_and_clean 2; fi done # Okay, our input is sane. Now let's get to it. for i in $@; do if [ -d $i ]; then # Recursively descend into the directory find $i ! -type d -print0 | xargs -0 $0; # Remove any fluff find $i | sed -e 's/^/./' | sort $TMP1 find .$i | sort $TMP2 if ! cmp $TMP1 $TMP2 /dev/null; then for i in `diff -u $TMP1 $TMP2 | sed -ne '1,2d' -e 's/^+//p'`; do echo rm -rf $i | sed -e s/\.\//`pwd`/; rm -rf $i; done fi else if file $i 2/dev/null | \ grep 'ELF.*executable.*dynamically linked' /dev/null; then LIBS=$LIBS `ldd \$i\ | sed -e '1,3d' -e 's/.* //' 2/dev/null`; fi smartcp $i fi done if [ x$LIBS != x ]; then LIBS=`echo $LIBS | sort | uniq` for i in $LIBS; do smartcp $i done fi exit_and_clean $ERROR ## EOF ## As to the legalese: this script is hereby placed into the public domain, so feel free to do as you please with it. I'd strongly suggest not altering certain features when publishing it on the internet, though. Joachim
Re: Site indexing application
On 3/21/06, Gabriel George POPA [EMAIL PROTECTED] wrote: Hello misc, I must install a search facility for my site. Do you know what is the most appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from Apache.org) before on my old Slackware 10.1 machine and I didn't like it very much (a lot of things to be done by hand). I'm asking that because I know the chroot(2) facility that Apache has on OpenBSD can cause a lot of trouble. ht://Dig works well for a quick'n'dirty solution. For something more in-depth have a look at lucene, (I think it's been taken over by apache now as well). --Bryan
Re: Site indexing application
On 3/24/06, Gabriel George POPA [EMAIL PROTECTED] wrote: Frank Denis wrote: I installed Hyper Estraier but now, because it is in chroot, it cannot find the libraries it depends on. I had this problem quite a few times with different programs. I did not have the time to solve it (with other programs too). What do I do: ldconfig? This is the standard method? ldconfig with what params? Or maybe it's better to set the LD_LIBRARY_PATH? ldd normally will take care of most of those types of issues. swish-e isn't bad for search/indexing.
Re: Site indexing application
Le Tue, Mar 21, 2006 at 02:18:10PM +0200, Gabriel George POPA ecrivait : Frank Denis wrote: Yes, very interesting. But I was looking for a very secure, highly proven solution, prepackaged for OpenBSD with Apache chrooted. Well, Hyper Estraier is far from being a beta project. It's an evolution of Estraier, itself based upon Snatcher, whoose work began 6 years ago. The code is very clean, it works and it's fast. The code of Mnogosearch (and DPsearch since it's based upon it) is messy and designed in a totally insecure and unreliable way. I had a hard time last year with it in order to add various hacks to have it work with our blog web site (skyblog.com). There were many ways to get it die with segmentation faults. And the indexer wasn't always able to resume its activity after a crash. Plus Mnogosearch doesn't scale as well as advertised. OTOH, Hyper Estraier scales really well. It just needs an OpenBSD port. -- Frank Denis - frank [at] nailbox.fr Young Nails / Akzentz nail tech
Site indexing application
Hello misc, I must install a search facility for my site. Do you know what is the most appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from Apache.org) before on my old Slackware 10.1 machine and I didn't like it very much (a lot of things to be done by hand). I'm asking that because I know the chroot(2) facility that Apache has on OpenBSD can cause a lot of trouble. George Popa
Re: Site indexing application
Le Tue, Mar 21, 2006 at 02:03:27PM +0200, Gabriel George POPA ecrivait : I must install a search facility for my site. Have a look at Hyper Estraier : http://hyperestraier.sourceforge.net/ It works amazingly well. -- Frank Denis - frank [at] nailbox.fr Young Nails / Akzentz nail tech http://www.manucure.info
Re: Site indexing application
mnoGoSearch: http://www.mnogosearch.org/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Gabriel George POPA Sent: Tuesday, March 21, 2006 7:03 AM To: misc@openbsd.org Subject: Site indexing application Hello misc, I must install a search facility for my site. Do you know what is the most appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from Apache.org) before on my old Slackware 10.1 machine and I didn't like it very much (a lot of things to be done by hand). I'm asking that because I know the chroot(2) facility that Apache has on OpenBSD can cause a lot of trouble. George Popa
Re: Site indexing application
On Tue, 21 Mar 2006, Gabriel George POPA wrote: Hello misc, I must install a search facility for my site. Do you know what is the most appropriate (Harvest, ht://Dig, Nutch?). I've used Nutch (from Apache.org) before on my old Slackware 10.1 machine and I didn't like it very much (a lot of things to be done by hand). I'm asking that because I know the chroot(2) facility that Apache has on OpenBSD can cause a lot of trouble. George Popa I installed dpsearch from http://www.dataparksearch.org. You can see it in action on http://www.wykids.org. It isn't any trouble at all to get working in the chroot. My config file was: ./configure \ --prefix=/dpsearch \ --with-pgsql \ --with-openssl \ --with-zlib \ --without-docs \ --without-aspell \ --enable-all-static This will install everything into /dpsearch, you can then make a /var/www/dpsearch and copy everything across. Documentation isn't up to OpenBSD standards, but that's a pretty high bar ;-) Still, I was able to get it running with minimum fuss. I've been contemplating making a port, but haven't yet looked into what all is involved. Hope that helps! Jeff