Thanks for all the responses.

Govind: To answer your question, yes, all I want to search is plain text
files. They are located in NFS directories across multiple Solaris/Linux
storage boxes. The total storage is in hundreds of terabytes.

I have just got started with Solr and my understanding is that I will
somehow need Tika to help stream/upload files to Solr. I don't know anything
about Java programming, being a system admin. So far, I have read that the
autodetect parser in Tika will somehow detect the file type and I can use
the stream to populate Solr. How, that is still a mystery to me - working on
it. Any tips appreciated; thanks in advance.

Sesh



On 13 November 2010 15:24, Govind Kanshi <govind.kan...@gmail.com> wrote:

> Another pov you might want to think about - what kind of search you want.
> Just plain - full text search or there is something more to those text
> files. Are they grouped in folders? Do the folders imply certain kind of
> grouping/hierarchy/tagging?
>
> I recently was trying to help somebody who had files across lot of places
> grouped by date/subject/author - he wanted to ensure these are "fields"
> which too can act as filters/navigators.
>
> Just an input - ignore it if you just want plain full text search.
>
> On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog <goks...@gmail.com> wrote:
>
> > About web servers: Solr is a servlet war file and needs a Java web server
> > "container" to run. The example/ folder in the Solr disribution uses
> > 'Jetty', and this is fine for small production-quality projects.  You can
> > just copy the example/ directory somewhere to set up your own running
> Solr;
> > that's what I always do.
> >
> > About indexing programs: if you know Unix scripting, it may be easiest to
> > walk the file system yourself with the 'find' program and create Solr
> input
> > XML files.
> >
> > But yes, you definitely want the Solr 1.4 Enterprise manual. I spent
> months
> > learning this stuff very slowly, and the book would have been great back
> > then.
> >
> > Lance
> >
> >
> > Erick Erickson wrote:
> >
> >> Think of the data import handler (DIH) as Solr pulling data to index
> >> from some source based on configuration. So, once you set up
> >> your DIH config to point to your file system, you issue a command
> >> to solr like "OK, do your data import thing". See the
> >> FileListEntityProcessor.
> >> http://wiki.apache.org/solr/DataImportHandler
> >>
> >> <http://wiki.apache.org/solr/DataImportHandler>SolrJ is a clent library
> >> you'd use to push data to Solr. Basically, you
> >> write a Java program that uses SolrJ to walk the file system, find
> >> documents, create a Solr document and sent that to Solr. It's not
> >> nearly as complex as it sounds<G>. See:
> >> http://wiki.apache.org/solr/Solrj
> >>
> >> <http://wiki.apache.org/solr/Solrj>It's probably worth your while to
> get
> >> a
> >> copy of "Solr 1.4, Enterprise Search Server"
> >> by Erik Pugh and David Smiley.
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer<seshadri...@gmail.com
> >> >wrote:
> >>
> >>
> >>
> >>> Hi Lance,
> >>>
> >>> Thank you very much for responding (not sure how I reply to the group,
> >>> so,
> >>> writing to you).
> >>>
> >>> Can you please expand on your suggestion? I am not a web guy and so,
> >>> don't
> >>> know where to start.
> >>>
> >>> What is the difference between SolrJ and DataImportHandler? Do I need
> to
> >>> set
> >>> up web servers on all my storage boxes?
> >>>
> >>> Apologies for the basic level of questions, but hope I can get started
> >>> and
> >>> implement this before the year end (you know why :o)
> >>>
> >>> Thanks,
> >>>
> >>> Sesh
> >>>
> >>> On 12 November 2010 13:31, Lance Norskog<goks...@gmail.com>  wrote:
> >>>
> >>>
> >>>
> >>>> Using 'curl' is fine. There is a library called SolrJ for Java and
> >>>> other libraries for other scripting languages that let you upload with
> >>>> more control. There is a thing in Solr called the DataImportHandler
> >>>> that lets you script walking a file system.
> >>>>
> >>>> On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer<
> seshadri...@gmail.com
> >>>>
> >>>> wrote:
> >>>>
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Pardon me if this sounds very elementary, but I have a very basic
> >>>>>
> >>>>>
> >>>> question
> >>>>
> >>>>
> >>>>> regarding Solr search. I have about 10 storage devices running
> Solaris
> >>>>>
> >>>>>
> >>>> with
> >>>>
> >>>>
> >>>>> hundreds of thousands of text files (there are other files, as well,
> >>>>>
> >>>>>
> >>>> but
> >>>
> >>>
> >>>> my
> >>>>
> >>>>
> >>>>> target is these text files). The directories on the Solaris boxes are
> >>>>> exported and are available as NFS mounts.
> >>>>>
> >>>>> I have installed Solr 1.4 on a Linux box and have tested the
> >>>>>
> >>>>>
> >>>> installation,
> >>>>
> >>>>
> >>>>> using curl to post  documents. However, the manual says that curl is
> >>>>>
> >>>>>
> >>>> not
> >>>
> >>>
> >>>> the
> >>>>
> >>>>
> >>>>> recommended way of posting documents to Solr. Could someone please
> tell
> >>>>>
> >>>>>
> >>>> me
> >>>>
> >>>>
> >>>>> what is the preferred approach in such an environment? I am not a
> >>>>>
> >>>>>
> >>>> programmer
> >>>>
> >>>>
> >>>>> and would appreciate some hand-holding here :o)
> >>>>>
> >>>>> Thanks in advance,
> >>>>>
> >>>>> Sesh
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Lance Norskog
> >>>> goks...@gmail.com
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
>

Reply via email to