OK. That looks promising. May want to leave out the zims. iiab can take several days to index a single zim.


-----Original Message----- From: Anish Mangal
Sent: Tuesday, February 17, 2015 2:04 PM
To: xsce-de...@googlegroups.com ; server-devel
Subject: Re: [XSCE] Re: Search engines and local git repos

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So in the settings sections I see a number of options

Web (which I guess would be html)
Database (SQL, Mongo)
Files
REST - not quite sure what this would do

Also there is this
https://github.com/opensearchserver/oss-text-extractor
(didn't check if it came bundled with the rpm - there's a separate rpm
from a different source as well). From the documentation

- ----

An open source RESTFul Web Service for text extraction and analysis.
oss-text-extractor supports various binary formats.

   Word processor (doc, docx, odt, rtf)
   Spreadsheet (xls, xlsx, ods)
   Presentation (ppt, pptx, odp)
   Publishing (pdf, pub)
   Web (rss, html/xhtml)
   Medias (audio, images)
   Others (vsd, text, markdown)

- ----

Seems quite useful.

Also, I think the usefulness of a search engine would go up when there
is more student/teacher generated content.

Cheers,
Anish



On Wednesday 18 February 2015 12:27 AM, Tim Moody wrote:
The problem with indexing is that it's a lot easier with text
files (like html) than binary files like pdf, doc, zim, etc.  iiab
and kiwix can both index zims, which is how we search wikis, but a
lot of our content is in binary files.  A quick look at
opensearchserver makes me think they mainly do html.


-----Original Message----- From: Anish Mangal Sent: Tuesday,
February 17, 2015 1:37 PM To: server-devel ; xsce-devel Subject:
[XSCE] Re: Search engines and local git repos

FWIW, I haven't checked for the existence of ARM packages. For
OpenSearchServer, the major dependency seemed to be java, so it
might not be quite so difficult there. (I hope)

Not sure of gitlab.

On Tuesday 17 February 2015 11:41 PM, Anish Mangal wrote:
Hi,

So I've been playing around with various things which might be
added to the XSCE and the two that I came across which seem
quite straightforward to setup are (basically installing an rpm
package and 1-2 small config steps)

1. OpenSearchServer - An offline search engine for content on
the XSCE

http://www.opensearchserver.com/

Use case: There may be tons of content stored under many
different web services on the xsce. Instead of going through each
service and manually searching/browsing, one may simply want to
.. 'google' :)

Caveat(s): * Crawling seems a cpu intensive process, but with
proper scheduling, it could be handled * Havent tested with IIAB
files yet. If anyone has an IIAB dataset online, please let me
know

My experiments:

* I basically installed the rpm a f21 VM and it works out of the
box! Has a detailed admin interface which basically controls
behavior to search and index webpages, databases, mailboxes etc.
and present to the user as a simple search box

Next step:

* Playbook



2. gitlab - github for the xsce

https://about.gitlab.com/

Does what it says on the cover .. limited use only if some kids
want to develop code

Possible integration with other projects like gitenberg (Seth
Woodworth) in the future... needs lots of exploration

My experiments:

* Install and test the provided rpm packages. Instructions
worked out of the box

Next step:

* Playbook


Thoughts, Anish

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU45DRAAoJEBoxUdDHDZVpD7oIAIBXi+oe9IGUOnKvoIE4hITe
k2nlgnVWoR3KlprH2KtFNhV6O/7+k8lvNkZJ4a/FwrGQcXmY060vqj2JFldpUHVw
wqsJS63PqL1rLxz+uQXT5juXyS6IZ+gwBXLPwV0+65M7cIucQBHRu2u+sLU2R+Pt
KM3CyUnaArUDxMUkJao9PchC7LtSmhcjaO0cljUIq/x3wKeMenmLOtZj/eYsn/7a
TX/PuQb2M9J8swydYu6ex3U9Nb6koJNSXInIxIOOmzCQvfaBSRxIGoU0ZhlaUIu0
pBSVNoQMuj15u4DAS1s/+HtVnNdoA0+nnNVrmqNOdFSECklZlgppaIK1G4DcD38=
=YOFB
-----END PGP SIGNATURE-----
_______________________________________________
Server-devel mailing list
Server-devel@lists.laptop.org
http://lists.laptop.org/listinfo/server-devel

Reply via email to