Re: SOLR for Log analysis feasibility

2010-12-01 Thread phoey

my thoughts exactly that it may seem fairly straightforward but i fear for
when a client wants a perfectly reasonable new feature to be added to their
report and SOLR simply cannot support this feature. 

i am hoping we wont have any real issues with scalability as Loggly because
we dont index and store large documents of data within SOLR. Most of our
documents will be very small.

Does anyone have any experience with using field collapsing in a production
environment?

thank you for all your replies. 

Joe

 

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-for-Log-analysis-feasibility-tp1992202p1998360.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR for Log analysis feasibility

2010-11-30 Thread Peter Karich

 take a look into this:
http://vimeo.com/16102543

for that amount of data it isn't that easy :-)


We are looking into building a reporting feature and investigating solutions
which will allow us to search though our logs for downloads, searches and
view history.

Each log item is relatively small

download history

add
doc
field name=uuiditem123-v1/field
field name=marketphotography/field
field name=nameitem 1/field
field name=userid1/field
field name=version1/field
field name=downloadTypehires/field
field name=itemId123/field
field name=timestamp2009-11-07T14:50:54Z/field
/doc
/add

search history

add
doc
field name=uuid1/field
field name=querybrand assets/field
field name=userid1/field
field name=timestamp2009-11-07T14:50:54Z/field
/doc
/add

view history

add
doc
field name=uuid1/field
field name=itemId123/field
field name=userid1/field
field name=timestamp2009-11-07T14:50:54Z/field
/doc
/add


and we reckon that we could have around 10 - 30 million log records for each
type (downloads, searches, views) so 70 million records in total but
obviously must scale higher.

concurrent users will be around 10 - 20 (relatively low)

new logs will be imported as a batch overnight.

Because we have some previous experience with SOLR and because the interface
needs to have full-text searching and filtering we built a prototype using
SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to
collapse on groups of data. For example view History needs to collapse on
itemId. Each row will then show the frequency on how many views the item has
had. This is achieved by the number of items which have been grouped.

The requirements for the solution is to be schemaless to allow adding new
fields to new documents easier, and have a powerful search interface, both
which SOLR can do.

QUESTIONS

Our prototype is working as expected but im unsure if

1. has anyone got experience with using SOLR for log analysis.
2. SOLR can scale but when is the limit when i should start considering
about sharding the index. It should be fine with 100+ million records.
3. We are using a nightly build of SOLR for the field collapsing feature.
Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has anyone
used this in production?

thanks



--
http://jetwick.com twitter search prototype



Re: SOLR for Log analysis feasibility

2010-11-30 Thread Stefan Matheis
i know, it's not solr .. but perhaps you should have a look at it:
http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/

On Tue, Nov 30, 2010 at 12:58 PM, Peter Karich peat...@yahoo.de wrote:

  take a look into this:
 http://vimeo.com/16102543

 for that amount of data it isn't that easy :-)


  We are looking into building a reporting feature and investigating
 solutions
 which will allow us to search though our logs for downloads, searches and
 view history.

 Each log item is relatively small

 download history

 add
doc
field name=uuiditem123-v1/field
field name=marketphotography/field
field name=nameitem 1/field
field name=userid1/field
field name=version1/field
field name=downloadTypehires/field
field name=itemId123/field
field name=timestamp2009-11-07T14:50:54Z/field
/doc
 /add

 search history

 add
doc
field name=uuid1/field
field name=querybrand assets/field
field name=userid1/field
field name=timestamp2009-11-07T14:50:54Z/field
/doc
 /add

 view history

 add
doc
field name=uuid1/field
field name=itemId123/field
field name=userid1/field
field name=timestamp2009-11-07T14:50:54Z/field
/doc
 /add


 and we reckon that we could have around 10 - 30 million log records for
 each
 type (downloads, searches, views) so 70 million records in total but
 obviously must scale higher.

 concurrent users will be around 10 - 20 (relatively low)

 new logs will be imported as a batch overnight.

 Because we have some previous experience with SOLR and because the
 interface
 needs to have full-text searching and filtering we built a prototype using
 SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to
 collapse on groups of data. For example view History needs to collapse on
 itemId. Each row will then show the frequency on how many views the item
 has
 had. This is achieved by the number of items which have been grouped.

 The requirements for the solution is to be schemaless to allow adding new
 fields to new documents easier, and have a powerful search interface, both
 which SOLR can do.

 QUESTIONS

 Our prototype is working as expected but im unsure if

 1. has anyone got experience with using SOLR for log analysis.
 2. SOLR can scale but when is the limit when i should start considering
 about sharding the index. It should be fine with 100+ million records.
 3. We are using a nightly build of SOLR for the field collapsing
 feature.
 Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has
 anyone
 used this in production?

 thanks



 --
 http://jetwick.com twitter search prototype




Re: SOLR for Log analysis feasibility

2010-11-30 Thread Peter Sturge
We do a lot of precisely this sort of thing. Ours is a commercial
product (Honeycomb Lexicon) that extracts behavioural information from
logs, events and network data (don't worry, I'm not pushing this on
you!) - only to say that there are a lot of considerations beyond base
Solr when it comes to handling log, event and other 'transient' data
streams.
Aside from the obvious issues of horizontal scaling, reliable
delivery/retry/replication etc., there are other important issues,
particularly with regards data classification, reporting engines and
numerous other items.
It's one of those things that's sounds perfectly reasonable at the
outset, but all sorts of things crop up the deeper you get into it.

Peter


On Tue, Nov 30, 2010 at 11:44 AM, phoey pho...@gmail.com wrote:

 We are looking into building a reporting feature and investigating solutions
 which will allow us to search though our logs for downloads, searches and
 view history.

 Each log item is relatively small

 download history

 add
        doc
                field name=uuiditem123-v1/field
                field name=marketphotography/field
                field name=nameitem 1/field
                field name=userid1/field
                field name=version1/field
                field name=downloadTypehires/field
                field name=itemId123/field
                field name=timestamp2009-11-07T14:50:54Z/field
        /doc
 /add

 search history

 add
        doc
                field name=uuid1/field
                field name=querybrand assets/field
                field name=userid1/field
                field name=timestamp2009-11-07T14:50:54Z/field
        /doc
 /add

 view history

 add
        doc
                field name=uuid1/field
                field name=itemId123/field
                field name=userid1/field
                field name=timestamp2009-11-07T14:50:54Z/field
        /doc
 /add


 and we reckon that we could have around 10 - 30 million log records for each
 type (downloads, searches, views) so 70 million records in total but
 obviously must scale higher.

 concurrent users will be around 10 - 20 (relatively low)

 new logs will be imported as a batch overnight.

 Because we have some previous experience with SOLR and because the interface
 needs to have full-text searching and filtering we built a prototype using
 SOLR 4.0. We used the new field collapsing feature within SOLR 4.0 to
 collapse on groups of data. For example view History needs to collapse on
 itemId. Each row will then show the frequency on how many views the item has
 had. This is achieved by the number of items which have been grouped.

 The requirements for the solution is to be schemaless to allow adding new
 fields to new documents easier, and have a powerful search interface, both
 which SOLR can do.

 QUESTIONS

 Our prototype is working as expected but im unsure if

 1. has anyone got experience with using SOLR for log analysis.
 2. SOLR can scale but when is the limit when i should start considering
 about sharding the index. It should be fine with 100+ million records.
 3. We are using a nightly build of SOLR for the field collapsing feature.
 Would it be possible to patch SOLR 1.4.1 with the SOLR-236 patch? has anyone
 used this in production?

 thanks
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-for-Log-analysis-feasibility-tp1992202p1992202.html
 Sent from the Solr - User mailing list archive at Nabble.com.