Thanks for all the responses. Govind: To answer your question, yes, all I want to search is plain text files. They are located in NFS directories across multiple Solaris/Linux storage boxes. The total storage is in hundreds of terabytes.
I have just got started with Solr and my understanding is that I will somehow need Tika to help stream/upload files to Solr. I don't know anything about Java programming, being a system admin. So far, I have read that the autodetect parser in Tika will somehow detect the file type and I can use the stream to populate Solr. How, that is still a mystery to me - working on it. Any tips appreciated; thanks in advance. Sesh On 13 November 2010 15:24, Govind Kanshi <govind.kan...@gmail.com> wrote: > Another pov you might want to think about - what kind of search you want. > Just plain - full text search or there is something more to those text > files. Are they grouped in folders? Do the folders imply certain kind of > grouping/hierarchy/tagging? > > I recently was trying to help somebody who had files across lot of places > grouped by date/subject/author - he wanted to ensure these are "fields" > which too can act as filters/navigators. > > Just an input - ignore it if you just want plain full text search. > > On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog <goks...@gmail.com> wrote: > > > About web servers: Solr is a servlet war file and needs a Java web server > > "container" to run. The example/ folder in the Solr disribution uses > > 'Jetty', and this is fine for small production-quality projects. You can > > just copy the example/ directory somewhere to set up your own running > Solr; > > that's what I always do. > > > > About indexing programs: if you know Unix scripting, it may be easiest to > > walk the file system yourself with the 'find' program and create Solr > input > > XML files. > > > > But yes, you definitely want the Solr 1.4 Enterprise manual. I spent > months > > learning this stuff very slowly, and the book would have been great back > > then. > > > > Lance > > > > > > Erick Erickson wrote: > > > >> Think of the data import handler (DIH) as Solr pulling data to index > >> from some source based on configuration. So, once you set up > >> your DIH config to point to your file system, you issue a command > >> to solr like "OK, do your data import thing". See the > >> FileListEntityProcessor. > >> http://wiki.apache.org/solr/DataImportHandler > >> > >> <http://wiki.apache.org/solr/DataImportHandler>SolrJ is a clent library > >> you'd use to push data to Solr. Basically, you > >> write a Java program that uses SolrJ to walk the file system, find > >> documents, create a Solr document and sent that to Solr. It's not > >> nearly as complex as it sounds<G>. See: > >> http://wiki.apache.org/solr/Solrj > >> > >> <http://wiki.apache.org/solr/Solrj>It's probably worth your while to > get > >> a > >> copy of "Solr 1.4, Enterprise Search Server" > >> by Erik Pugh and David Smiley. > >> > >> Best > >> Erick > >> > >> On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer<seshadri...@gmail.com > >> >wrote: > >> > >> > >> > >>> Hi Lance, > >>> > >>> Thank you very much for responding (not sure how I reply to the group, > >>> so, > >>> writing to you). > >>> > >>> Can you please expand on your suggestion? I am not a web guy and so, > >>> don't > >>> know where to start. > >>> > >>> What is the difference between SolrJ and DataImportHandler? Do I need > to > >>> set > >>> up web servers on all my storage boxes? > >>> > >>> Apologies for the basic level of questions, but hope I can get started > >>> and > >>> implement this before the year end (you know why :o) > >>> > >>> Thanks, > >>> > >>> Sesh > >>> > >>> On 12 November 2010 13:31, Lance Norskog<goks...@gmail.com> wrote: > >>> > >>> > >>> > >>>> Using 'curl' is fine. There is a library called SolrJ for Java and > >>>> other libraries for other scripting languages that let you upload with > >>>> more control. There is a thing in Solr called the DataImportHandler > >>>> that lets you script walking a file system. > >>>> > >>>> On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer< > seshadri...@gmail.com > >>>> > >>>> wrote: > >>>> > >>>> > >>>>> Hi, > >>>>> > >>>>> Pardon me if this sounds very elementary, but I have a very basic > >>>>> > >>>>> > >>>> question > >>>> > >>>> > >>>>> regarding Solr search. I have about 10 storage devices running > Solaris > >>>>> > >>>>> > >>>> with > >>>> > >>>> > >>>>> hundreds of thousands of text files (there are other files, as well, > >>>>> > >>>>> > >>>> but > >>> > >>> > >>>> my > >>>> > >>>> > >>>>> target is these text files). The directories on the Solaris boxes are > >>>>> exported and are available as NFS mounts. > >>>>> > >>>>> I have installed Solr 1.4 on a Linux box and have tested the > >>>>> > >>>>> > >>>> installation, > >>>> > >>>> > >>>>> using curl to post documents. However, the manual says that curl is > >>>>> > >>>>> > >>>> not > >>> > >>> > >>>> the > >>>> > >>>> > >>>>> recommended way of posting documents to Solr. Could someone please > tell > >>>>> > >>>>> > >>>> me > >>>> > >>>> > >>>>> what is the preferred approach in such an environment? I am not a > >>>>> > >>>>> > >>>> programmer > >>>> > >>>> > >>>>> and would appreciate some hand-holding here :o) > >>>>> > >>>>> Thanks in advance, > >>>>> > >>>>> Sesh > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Lance Norskog > >>>> goks...@gmail.com > >>>> > >>>> > >>>> > >>> > >>> > >> > >> > > >