Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge

2010-03-31 Thread Silent Surfer
Hi Mitch,

The configuration that you have seems to be perfectly fine .
Could you please let us know what error you are seeing in the logs ?

Also, could you please confirm whether you have the 
mysql-connector-java-5.1.12-bin.jar under the lib folder ? 

Following is my configuration that I used and works perfectly fine
dataSource driver=com.mysql.jdbc.Driver autoCommit=true 
url=jdbc:mysql://localhost:3306/mysql user=username password=password /


Thanks,
sS


- Original Message 
From: MitchK mitc...@web.de
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 12:57:04 AM
Subject: Re: DIH - Unable to connect to a DB using JDBC:ODBC bridge


Hi,

sorry, I have not much experiences in doing this with Solr, but my
data-config.xml looks like:

dataConfig
dataSource driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/db user=user password=...
batchSize=-1/
document

/document
/dataConfig

The db at the end of the url stands for the db you want to use. 

Perhaps this helps a little bit.

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/DIH-Unable-to-connect-to-a-DB-using-JDBC-ODBC-bridge-tp686781p687887.html
Sent from the Solr - User mailing list archive at Nabble.com.



  



Re: Query time only Ranges

2010-03-31 Thread Silent Surfer
Hi Ankit,

Try the following approach.
create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Solr will automatically will take care of Rounding off to the HOUR specified.

For eg:
the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Regards,
sS


- Original Message 
From: abhatna...@vantage.com abhatna...@vantage.com
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 9:56:38 AM
Subject: Query time only Ranges


Hi All,

I am working on use case - wherein i need to Query to just time ranges
without date component.

search for docs with between 4pm - 6pm 

Approaches-
create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a
fixed time component

or

create a field for hh only

or may be create a custom field for Time only


Please suggest me which will be a good approach or any other approach if
possible


Ankit



-- 
View this message in context: 
http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html
Sent from the Solr - User mailing list archive at Nabble.com.



  



Re: Query time only Ranges

2010-03-31 Thread Silent Surfer
Small typo..Corrected and sending..

the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z]


Thx,
Tiru


- Original Message 
From: Silent Surfer silentsurfe...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 12:36:22 PM
Subject: Re: Query time only Ranges

Hi Ankit,

Try the following approach.
create a query like - [01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Solr will automatically will take care of Rounding off to the HOUR specified.

For eg:
the query - [01/01/1900T16:43:42Z/HOUR TO 01/01/1900T18:55:23Z/HOUR ] 
would be equivalent to 
[01/01/1900T16:00:00Z/HOUR TO 01/01/1900T18:00:00Z/HOUR ]

Regards,
sS


- Original Message 
From: abhatna...@vantage.com abhatna...@vantage.com
To: solr-user@lucene.apache.org
Sent: Wed, March 31, 2010 9:56:38 AM
Subject: Query time only Ranges


Hi All,

I am working on use case - wherein i need to Query to just time ranges
without date component.

search for docs with between 4pm - 6pm 

Approaches-
create something like - [01/01/1900T16:00:00Z TO 01/01/1900T18:00:00Z ] - a
fixed time component

or

create a field for hh only

or may be create a custom field for Time only


Please suggest me which will be a good approach or any other approach if
possible


Ankit



-- 
View this message in context: 
http://n3.nabble.com/Query-time-only-Ranges-tp688831p688831.html
Sent from the Solr - User mailing list archive at Nabble.com.


  


Re: multicore query via solrJ

2009-10-23 Thread Silent Surfer
Hi Lici,

You may want to try the following snippet

---
SolrServer solr = new 
CommonsHttpSolrServer(http://localhost:8983/solr;); // 

ModifiableSolrParams params = new ModifiableSolrParams();

params.set(wt, json); // Can be json,standard..
params.set(rows, RowsToFetch); // Total # of rows to fetch
params.set(start, StartingRow);  // Starting record
params.set(shards, 
localhost:8983/solr,localhost:8984/solr,localhost:8985/solr); // Shard URL
.
.
.
params.set(q, queryStr.toString());  // User Query
QueryResponse response = solr.query(params);
SolrDocumentList docs = response.getResults();
---

Thanks,
sS

--- On Fri, 10/23/09, Licinio Fernández Maurelo licinio.fernan...@gmail.com 
wrote:

 From: Licinio Fernández Maurelo licinio.fernan...@gmail..com
 Subject: Re: multicore query via solrJ
 To: solr-user@lucene.apache.org
 Date: Friday, October 23, 2009, 7:30 AM
 As no answer is given, I assume it's
 not possible. It will be great to code
 a method like this
 
 query(SolrServer,  ListSolrServer)
 
 
 
 El 20 de octubre de 2009 11:21, Licinio Fernández Maurelo
 
 licinio.fernan...@gmail.com
 escribió:
 
  Hi there,
  is there any way to perform a multi-core query using
 solrj?
 
  P.S.:
 
  I know about this syntax:
  http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1q=
  but i'm looking for a more fancy way to do this using
 solrj (something like
  shards(query) )
 
  thx
 
 
 
  --
  Lici
 
 
 
 
 -- 
 Lici
 






Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-25 Thread Silent Surfer
Hi Michael,

We are storing all our data in addition to index, as we need to display those 
values to the user. So unfortunately we cannot go with the option stored=false, 
which could have potentially solved our issue.

Appreciate any other pointers/suggestions

Thanks,
sS

--- On Fri, 9/25/09, Michael solrco...@gmail.com wrote:

 From: Michael solrco...@gmail.com
 Subject: Re: Can we point a Solr server to index directory dynamically at  
 runtime..
 To: solr-user@lucene.apache.org
 Date: Friday, September 25, 2009, 2:00 PM
 Are you storing (in addition to
 indexing) your data?  Perhaps you could turn
 off storage on data older than 7 days (requires
 reindexing), thus losing the
 ability to return snippets but cutting down on your storage
 space and server
 count.  I've experienced 10x decrease in space
 requirements and a large
 boost in speed after cutting extraneous storage from Solr
 -- the stored data
 is mixed in with the index data and so it slows down
 searches.
 You could also put all 200G onto one Solr instance rather
 than 10 for 7days
 data, and accept that those searches will be slower.
 
 Michael
 
 On Fri, Sep 25, 2009 at 1:34 AM, Silent Surfer 
 silentsurfe...@yahoo.comwrote:
 
  Hi,
 
  Thank you Michael and Chris for the response.
 
  Today after the mail from Michael, we tested with the
 dynamic loading of
  cores and it worked well. So we need to go with the
 hybrid approach of
  Multicore and Distributed searching.
 
  As per our testing, we found that a Solr instance with
 20 GB of
  index(single index or spread across multiple cores)
 can provide better
  performance when compared to having a Solr instance
 say 40 (or) 50 GB of
  index (single index or index spread across cores).
 
  So the 200 GB of index on day 1 will be spread across
 200/20=10 Solr salve
  instances.
 
  On day 2 data, 10 more Solr slave servers are
 required; Cumulative Solr
  Slave instances = 200*2/20=20
  ...
  ..
  ..
  On day 30 data, 10 more Solr slave servers are
 required; Cumulative Solr
  Slave instances = 200*30/20=300
 
  So with the above approach, we may need ~300 Solr
 slave instances, which
  becomes very unmanageable.
 
  But we know that most of the queries is for the past 1
 week, i.e we
  definitely need 70 Solr Slaves containing the last 7
 days worth of data up
  and running.
 
  Now for the rest of the 230 Solr instances, do we need
 to keep it running
  for the odd query,that can span across the 30 days of
 data (30*200 GB=6 TB
  data) which can come up only a couple of times a day.
  This linear increase of Solr servers with the
 retention period doesn't
  seems to be a very scalable solution.
 
  So we are looking for something more simpler approach
 to handle this
  scenario.
 
  Appreciate any further inputs/suggestions.
 
  Regards,
  sS
 
  --- On Fri, 9/25/09, Chris Hostetter hossman_luc...@fucit.org
 wrote:
 
   From: Chris Hostetter hossman_luc...@fucit.org
   Subject: Re: Can we point a Solr server to index
 directory dynamically
  at  runtime..
   To: solr-user@lucene.apache.org
   Date: Friday, September 25, 2009, 4:04 AM
   : Using a multicore approach, you
   could send a create a core named
   : 'core3weeksold' pointing to
 '/datadirs/3weeksold' 
   command to a live Solr,
   : which would spin it up on the fly.  Then
 you query
   it, and maybe keep it
   : spun up until it's not queried for 60 seconds
 or
   something, then send a
   : remove core 'core3weeksold'  command.
   : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
   .
  
   something that seems implicit in the question is
 what to do
   when the
   request spans all of the data ... this is where
 (in theory)
   distributed
   searching could help you out.
  
   index each days worth of data into it's own core,
 that
   makes it really
   easy to expire the old data (just UNLOAD and
 delete an
   entire core once
   it's more then 30 days old) if your user is only
 searching
   current dta
   then your app can directly query the core
 containing the
   most current data
   -- but if they want to query the last week, or
 last two
   weeks worth of
   data, you do a distributed request for all of the
 shards
   needed to search
   the appropriate amount of data.
  
   Between the ALIAS and SWAP commands it on the
 CoreAdmin
   screen it should
   be pretty easy have cores with names like
   today,1dayold,2dayold so
   that your app can configure simple shard params
 for all the
   perumations
   you'll need to query.
  
  
   -Hoss
  
  
 
 
 
 
 
 
 






Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Silent Surfer
Hi,

Thank you Michael and Chris for the response. 

Today after the mail from Michael, we tested with the dynamic loading of cores 
and it worked well. So we need to go with the hybrid approach of Multicore and 
Distributed searching.

As per our testing, we found that a Solr instance with 20 GB of index(single 
index or spread across multiple cores) can provide better performance when 
compared to having a Solr instance say 40 (or) 50 GB of index (single index or 
index spread across cores).

So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve 
instances.

On day 2 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*2/20=20
...
..
..
On day 30 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*30/20=300

So with the above approach, we may need ~300 Solr slave instances, which 
becomes very unmanageable.

But we know that most of the queries is for the past 1 week, i.e we definitely 
need 70 Solr Slaves containing the last 7 days worth of data up and running.

Now for the rest of the 230 Solr instances, do we need to keep it running for 
the odd query,that can span across the 30 days of data (30*200 GB=6 TB data) 
which can come up only a couple of times a day.
This linear increase of Solr servers with the retention period doesn't seems to 
be a very scalable solution. 

So we are looking for something more simpler approach to handle this scenario. 

Appreciate any further inputs/suggestions.

Regards,
sS

--- On Fri, 9/25/09, Chris Hostetter hossman_luc...@fucit.org wrote:

 From: Chris Hostetter hossman_luc...@fucit.org
 Subject: Re: Can we point a Solr server to index directory dynamically at  
 runtime..
 To: solr-user@lucene.apache.org
 Date: Friday, September 25, 2009, 4:04 AM
 : Using a multicore approach, you
 could send a create a core named
 : 'core3weeksold' pointing to '/datadirs/3weeksold' 
 command to a live Solr,
 : which would spin it up on the fly.  Then you query
 it, and maybe keep it
 : spun up until it's not queried for 60 seconds or
 something, then send a
 : remove core 'core3weeksold'  command.
 : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
 .
 
 something that seems implicit in the question is what to do
 when the 
 request spans all of the data ... this is where (in theory)
 distributed 
 searching could help you out.
 
 index each days worth of data into it's own core, that
 makes it really 
 easy to expire the old data (just UNLOAD and delete an
 entire core once 
 it's more then 30 days old) if your user is only searching
 current dta 
 then your app can directly query the core containing the
 most current data 
 -- but if they want to query the last week, or last two
 weeks worth of 
 data, you do a distributed request for all of the shards
 needed to search 
 the appropriate amount of data.
 
 Between the ALIAS and SWAP commands it on the CoreAdmin
 screen it should 
 be pretty easy have cores with names like
 today,1dayold,2dayold so 
 that your app can configure simple shard params for all the
 perumations 
 you'll need to query.
 
 
 -Hoss
 








Can we point a Solr server to index directory dynamically at runtime..

2009-09-23 Thread Silent Surfer
Hi,

Is there any way to dynamically point the Solr servers to an index/data 
directories at run time?

We are generating 200 GB worth of index per day and we want to retain the index 
for approximately 1 month. So our idea is to keep the first 1 week of index 
available at anytime for the users i.e have set of Solr servers up and running 
and handle request to get the past 1 week of date. 

But when user tries to query data which is older than 7 days old, we want to 
dynamically point the existing Solr instances to the inactive/dormant indexes 
and get the results.

The main intention is to limit the number of Solr Slave instances and there by 
limit the # of Servers required.

If the index directory and Solr instances are tightly coupled, then most of the 
Solr instances are just up and running and may hardly used, as most of the 
users are mainly interested in past 1 week data and not beyond that.

Any thoughts or any other approaches to tackle this would be greatly 
appreciated.

Thanks,
sS


  



Query regarding incremental index replication

2009-09-09 Thread Silent Surfer
Hi ,

Currently we are using Solr 1.3 and we have the following requirement.

As we need to process very high volumes of documents (of the order of 400 GB 
per day), we are planning to separate indexer(s) and searcher(s), so that there 
won't be performance hit.

Our idea is to have have a set of servers which is used only for indexers for 
index creation and then every 5 mins or so, the index will be copied to the 
searchers(set of solr servers only for querying). For this we tried to use the 
snapshooter,rsysnc etc.

But the problem with this approach is, the same index is present on both the 
indexer and searcher, and hence occupying large FS.

What we need is a mechanism, where in the indexer contains only the index for 
the past 5 mins(last indexing cycle before the snap shooter is run) and the 
searcher should have the accumulated(total) index i.e every 5 mins, we should 
be able to move the entire index from indexer to searcher and so on.

The above scenario is slightly different from master/slave implementation, as 
on master we want only the latest(WIP) index and the slave should contain the 
entire index.

Appreciate if anyone can throw some light on how to achieve this.

Thanks,
sS


  



Re: date field

2009-09-08 Thread Silent Surfer
Hi,

If you are still not went live already, I would suggest to use the long instead 
of date field. According to our testing, search based on date fields are very 
slow when compared to search based on long field.

You can use System.getTimeInMillis() to get the time
When showing it to the user, apply a date formatter.

When taking input from user, let him enter whatever the date he wants to and 
then you can convert to long and do your searches based on it.

Experts can pitch in with any other ideas..

Thanks,
sS

--- On Tue, 9/8/09, Gérard Dupont ger.dup...@gmail.com wrote:

 From: Gérard Dupont ger.dup...@gmail.com
 Subject: date field
 To: solr-user@lucene.apache.org
 Cc: Nicolas Bureau nicolas@gmail.com
 Date: Tuesday, September 8, 2009, 8:51 AM
 Hi all,
 
 I'm currently facing a little difficulty to index and
 search on date field.
 The indexing is done in the right way (I guess) and I can
 find valid date in
 the field like 2009-05-01T12:45:32Z. However when I'm
 searching the user
 don't always give an exact date. for instance they give
 2008-05-01 to get
 all documents related to that day.  I can do a trick
 using wildcard but is
 there another way to do it ? Moreover if they give the full
 date string (or
 if I hack the query parser) I can have the full syntax, but
 then the :
 annoy me because the Lucene parser does not allow it
 without quotes. Any
 ideas ?
 
 -- 
 Gérard Dupont
 Information Processing Control and Cognition (IPCC) - EADS
 DS
 http://weblab.forge.ow2.org
 
 Document  Learning team - LITIS Laboratory
 


 



Impact of compressed=true attribute (in schema.xml) on Indexing/Query

2009-08-29 Thread Silent Surfer
Hi,

We observed that when we use the setting compressed=true the index size is 
around 0.66 times the actual log file, where as if we do not use any 
compressed=true setting, the index size is almost as much as 2.6 times.

Our sample solr document size is approximately 1000 bytes. In addition to the 
text data we have around 9 metadata tags associated to it. 

We need to display all off the metadata values on the GUI, and hence we are 
setting stored=true in our schema.xml

Now the question is, how the compressed=true flag impacts the indexing and 
Querying operations. I am sure that there will be CPU utilization spikes as 
there will be operation of compressing(during indexing) and 
uncompressing(during querying) of the indexed data. I am mainly looking for any 
bench marks for the above scenario.

The expected volumes of the data coming in would be approximately 400 GB of 
data per day, so it is very important for us to evaluate the compressed=true, 
due to the file system utilization and index sizing issues.

Any help would be greatly appreciated..

Thanks,
sS


  



How to reduce the Solr index size..

2009-08-20 Thread Silent Surfer
Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the indexes for 
each line of the logs, so that users would be able to do a fine grain search 
upto second/ms.

Now what we are observing is , the index size that is being created is almost 
double the size of the actual log size. i.e if the logs size is say 1 MB, the 
actual index size is around 2 MB.

Could anyone let us know what can be done to reduce the index size. Do we need 
to change any configurations/delete any files which are created during the 
indexing processes, but not required for searching..

Our schema is as follows:

   field name=pkey type=string indexed=true stored=true 
required=false / 
   field name=date type=date indexed=true stored=true 
omitNorms=true/
   field name=level type=string indexed=true stored=true/
   field name=app type=string indexed=true stored=true/
   field name=server type=string indexed=true stored=true/
   field name=port type=string indexed=true stored=true/
   field name=class type=string indexed=true stored=true/
   field name=method type=string indexed=true stored=true/
   field name=filename type=string indexed=true stored=true/
   field name=linenumber type=string indexed=true stored=true/
   field name=message type=text indexed=true stored=true/

message field holds the actual logtext.

Thanks,
sS


  



Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi ,

We are planning to use Solr for indexing the server log contents.
The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).

Can any one provide what would be the optimal size of the index that I can 
store on a single server, without hampering the search performance etc.

We are planning to use OSX server with a configuration of 16 GB (Can go to 24 
GB).

We need to figure out how many servers are required to handle such amount of 
data..

Any help would be greatly appreciated.

Thanks
SilentSurfer


  



Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 
servers. 

It would be very hard to convince my org to go for 125 servers for log 
management of 3 Terabytes of indexes. 

Has any one used, solr for processing and handling of the indexes of the order 
of 3 TB ? If so how many servers were used for indexing alone.

Thanks,
sS


--- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com wrote:

 From: Ian Connor ian.con...@gmail.com
 Subject: Re: Limit of Index size per machine..
 To: solr-user@lucene.apache.org
 Date: Wednesday, August 5, 2009, 9:38 PM
 I try to keep the index directory
 size less than the amount of RAM and rely
 on the OS to cache as it needs. Linux does a pretty good
 job here and I am
 sure OS X will do a good job also.
 
 Distributed search here will be your friend so you can
 chunk it up to a
 number of servers to keep your cost down (2GB RAM sticks
 are much cheaper
 than 4GB RAM sticks $20  $100).
 
 Ian.
 
 On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.comwrote:
 
 
  Hi ,
 
  We are planning to use Solr for indexing the server
 log contents.
  The expected processed log file size per day: 100 GB
  We are expecting to retain these indexes for 30 days
 (100*30 ~ 3 TB).
 
  Can any one provide what would be the optimal size of
 the index that I can
  store on a single server, without hampering the search
 performance etc.
 
  We are planning to use OSX server with a configuration
 of 16 GB (Can go to
  24 GB).
 
  We need to figure out how many servers are required to
 handle such amount
  of data..
 
  Any help would be greatly appreciated.
 
  Thanks
  SilentSurfer
 
 
 
 
 
 
 
 -- 
 Regards,
 
 Ian Connor
 1 Leighton St #723
 Cambridge, MA 02141
 Call Center Phone: +1 (714) 239 3875 (24 hrs)
 Fax: +1(770) 818 5697
 Skype: ian.connor
 


  



Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

We initially went with Hadoop path, but as it is one more software based file 
system on top of the OS file system, we didn't get a buy in from our system 
Engineers. i.e In case if we run into any HDFS issues, SEs won't be supporting 
us :(

Regards,
sS

--- On Thu, 8/6/09, Walter Underwood wun...@wunderwood.org wrote:

 From: Walter Underwood wun...@wunderwood.org
 Subject: Re: Limit of Index size per machine..
 To: solr-user@lucene.apache.org
 Date: Thursday, August 6, 2009, 5:12 AM
 That is why people don't use search
 engines to manage logs. Look at a  
 Hadoop cluster.
 
 wunder
 
 On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote:
 
 
  Hi,
 
  That means we need approximately 3000 GB (Index
 Size)/24 GB (RAM) =  
  125 servers.
 
  It would be very hard to convince my org to go for 125
 servers for  
  log management of 3 Terabytes of indexes.
 
  Has any one used, solr for processing and handling of
 the indexes of  
  the order of 3 TB ? If so how many servers were used
 for indexing  
  alone.
 
  Thanks,
  sS
 
 
  --- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com
 wrote:
 
  From: Ian Connor ian.con...@gmail.com
  Subject: Re: Limit of Index size per machine..
  To: solr-user@lucene.apache.org
  Date: Wednesday, August 5, 2009, 9:38 PM
  I try to keep the index directory
  size less than the amount of RAM and rely
  on the OS to cache as it needs. Linux does a
 pretty good
  job here and I am
  sure OS X will do a good job also.
 
  Distributed search here will be your friend so you
 can
  chunk it up to a
  number of servers to keep your cost down (2GB RAM
 sticks
  are much cheaper
  than 4GB RAM sticks $20  $100).
 
  Ian.
 
  On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer
 silentsurfe...@yahoo.com
 
  wrote:
 
 
  Hi ,
 
  We are planning to use Solr for indexing the
 server
  log contents.
  The expected processed log file size per day:
 100 GB
  We are expecting to retain these indexes for
 30 days
  (100*30 ~ 3 TB).
 
  Can any one provide what would be the optimal
 size of
  the index that I can
  store on a single server, without hampering
 the search
  performance etc.
 
  We are planning to use OSX server with a
 configuration
  of 16 GB (Can go to
  24 GB).
 
  We need to figure out how many servers are
 required to
  handle such amount
  of data..
 
  Any help would be greatly appreciated.
 
  Thanks
  SilentSurfer
 
 
 
 
 
 
 
  -- 
  Regards,
 
  Ian Connor
  1 Leighton St #723
  Cambridge, MA 02141
  Call Center Phone: +1 (714) 239 3875 (24 hrs)
  Fax: +1(770) 818 5697
  Skype: ian.connor
 
 
 
 
 
 








Query regarding Solr search options..

2009-06-23 Thread Silent Surfer

Hi,

Can Solr search be customized to provide N number of lines before and after the 
line that contains matches the keyword.

For eg: Suppose i have a document with 10 lines, and 5th line contains the key 
word 'X' I am interested in. Now if I am fire a Solr search for the keyword 
'X'. Is there any preference/option available in Solr, which can be set so the 
search results contains only the 3 lines above and 3 lines after the line where 
the Keyword match successfully.

Thanks,
Silent Surfer


  


Re: Questions regarding IT search solution

2009-06-08 Thread Silent Surfer
Hi Jeff,
Thanks for the link.  You are my lifesaver :)This is exactly simillar to what I 
am looking for.
Thanks,Surfer

--- On Fri, 6/5/09, Jeff Hammerbacher ham...@cloudera.com wrote:

From: Jeff Hammerbacher ham...@cloudera.com
Subject: Re: Questions regarding IT search solution
To: solr-user@lucene.apache.org, silentsurfe...@yahoo.com
Date: Friday, June 5, 2009, 12:15 AM

Hey,

Your system sounds similar to the work don by Stu Hood at Rackspace in their
Mailtrust unit. See
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor
more details and inspiration.

Regards,
Jeff

On Thu, Jun 4, 2009 at 4:58 PM, silentsurfe...@yahoo.com wrote:

 Hi,
 This is encouraging to know that solr/lucene solution may work.
 Can anyone using solr/lucene for such scenario can confirm that the
 solution is used and working fine? That would be really helpful, as I just
 started looking into the solr/lucene solution only couple of days back and
 might be difficult to be 100% confident before proposing the solution
 approach in next couple of days.
 Thanks,Surfer

 --- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 Subject: Re: Questions regarding IT search solution
 To:
  solr-user@lucene.apache.org
 Date: Thursday, June 4, 2009, 10:26 PM


 My guess is Solr/Lucene would work.  Not sure how well/fast, but it would,
 esp. if you avoid range queries (or use tdate), and esp. if you
 shard/segment indices smartly, so that at query time you send (or distribute
 if you have to) the query to only those shards that have the data (if your
 query is for a limited time period).

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Silent Surfer silentsurfe...@yahoo.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 5:52:21 PM
  Subject: Re:
  Questions regarding IT search solution
 
  Hi,
  As Alex correctly pointed out my main intention is to figure out whether
  Solr/lucene offer functionalities to replicate what Splunk is doing in
 terms of
  building indexes etc for enabling search capabilities.
  We evaluated Splunk, but it is not very cost effective solution for us as
 we may
  have logs running into few GBs per day as there can be around 25-20
 servers
  running, and Splunk licensing model is based of size of logs per day that
 too,
  the license valid for only 1 year.
  With this back ground, any further inputs on this are greatly
 appreciated.
  Thanks,Surfer
 
  --- On Thu, 6/4/09, Alexandre Rafalovitch wrote:
 
  From: Alexandre Rafalovitch
  Subject: Re: Questions regarding IT search solution
  To: solr-user@lucene.apache.org
  Date: Thursday, June 4, 2009, 9:27 PM
 
  I would also be interested to know what other existing solutions exist.
 
  Splunk's advantage is that it does extraction of the fields with
  advanced searching functionality (it has lexers/parsers for multiple
  content types). I believe that's the Solr's function desired in
  original posting. At the time they came out (2004), I was not aware of
  any good open source solutions to do what they did. And I would have
  loved one, as I was analyzing multi-gigabite logs.
 
  Hadoop might be a way to process the files, but what would do the
  indexing and searching?
 
  Regards,
      Alex.
 
  On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote:
   Why build one? Don't those already exist?
  
   Personally, I'd start with Hadoop instead of Solr. Putting
  logs in a
   search index is guaranteed to not scale. People were already trying
   different approaches ten years ago.
  
   wunder
  
   On 6/4/09 8:41 AM, Silent Surfer wrote:
  
   Hi,
   Any help/pointers on the following message would really help me..
   Thanks,Surfer
  
   --- On Tue, 6/2/09, Silent Surfer wrote:
  
   From: Silent Surfer
   Subject: Questions regarding IT search solution
   To: solr-user@lucene.apache.org
   Date: Tuesday, June 2, 2009, 5:45 PM
  
   Hi,
   I am new to Lucene forum and it is my first question.I need a
 clarification
   from you.
   Requirement:--1. Build a IT search tool for logs
 similar to
   that of Splunk(Only wrt searching logs but not in terms of reporting,
 graphs
   etc) using
  solr/lucene. The log files are mainly the server logs like JBoss,
   Custom application server logs (May or may not be log4j logs) and the
 files
   size can go potentially upto 100 MB2. The logs are spread across
 multiple
   servers (25 to 30 servers)2. Capability to be do search almost
 realtime3.
   Support  distributed search
  
   Our search criterion can be based on a keyword or timestamp or IP
 address
  etc.
   Can anyone throw some light if solr/lucene is right solution for this
 ?
   Appreciate any quick help in this regard.
   Thanks,Surfer









  

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer

--- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:

From: Silent Surfer silentsurfe...@yahoo.com
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM

Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer

  


  

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer

--- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:

From: Silent Surfer silentsurfe...@yahoo.com
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM

Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer

  


  

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi,
As Alex correctly pointed out my main intention is to figure out whether 
Solr/lucene offer functionalities to replicate what Splunk is doing in terms of 
building indexes etc for enabling search capabilities.
We evaluated Splunk, but it is not very cost effective solution for us as we 
may have logs running into few GBs per day as there can be around 25-20 servers 
running, and Splunk licensing model is based of size of logs per day that too, 
the license valid for only 1 year.
With this back ground, any further inputs on this are greatly appreciated.
Thanks,Surfer 

--- On Thu, 6/4/09, Alexandre Rafalovitch arafa...@gmail.com wrote:

From: Alexandre Rafalovitch arafa...@gmail.com
Subject: Re: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 9:27 PM

I would also be interested to know what other existing solutions exist.

Splunk's advantage is that it does extraction of the fields with
advanced searching functionality (it has lexers/parsers for multiple
content types). I believe that's the Solr's function desired in
original posting. At the time they came out (2004), I was not aware of
any good open source solutions to do what they did. And I would have
loved one, as I was analyzing multi-gigabite logs.

Hadoop might be a way to process the files, but what would do the
indexing and searching?

Regards,
    Alex.

On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwunderw...@netflix.com wrote:
 Why build one? Don't those already exist?

 Personally, I'd start with Hadoop instead of Solr. Putting logs in a
 search index is guaranteed to not scale. People were already trying
 different approaches ten years ago.

 wunder

 On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote:

 Hi,
 Any help/pointers on the following message would really help me..
 Thanks,Surfer

 --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:

 From: Silent Surfer silentsurfe...@yahoo.com
 Subject: Questions regarding IT search solution
 To: solr-user@lucene.apache.org
 Date: Tuesday, June 2, 2009, 5:45 PM

 Hi,
 I am new to Lucene forum and it is my first question.I need a clarification
 from you.
 Requirement:--1. Build a IT search tool for logs similar to
 that of Splunk(Only wrt searching logs but not in terms of reporting, graphs
 etc) using solr/lucene. The log files are mainly the server logs like JBoss,
 Custom application server logs (May or may not be log4j logs) and the files
 size can go potentially upto 100 MB2. The logs are spread across multiple
 servers (25 to 30 servers)2. Capability to be do search almost realtime3.
 Support  distributed search

 Our search criterion can be based on a keyword or timestamp or IP address 
 etc.
 Can anyone throw some light if solr/lucene is right solution for this ?
 Appreciate any quick help in this regard.
 Thanks,Surfer



  

Questions regarding IT search solution

2009-06-02 Thread Silent Surfer
Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer



Thanks,Tiru