Re: Solr Logo thought

2008-08-20 Thread Lukáš Vlček
Hi,

Only few responded so far. How we can get more feedback? Do you think I
should work on the proposal a little bit more and then attach it to SOLR-84?

Regards,
Lukas

On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:

 I like it, even its asymmetry. :)


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Lukáš Vlček [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Sunday, August 17, 2008 7:02:25 PM
  Subject: Re: Solr Logo thought
 
  Hi,
 
  My initial draft of Solr logo can be found here:
  http://picasaweb.google.com/lukas.vlcek/Solr
  The reason why I haven't attached it to SOLR-84 for now is that this is
 just
  draft and not final design (there are a lot of unfinished details). I
 would
  like to get some feedback before I spend more time on it.
 
  I had several ideas but in the end I found that the simplicity works
 best.
  Simple font, sun motive, just two colors. Should look fine in both the
 large
  and small formats. As for the favicon I would use the sun motive only -
 it
  means the O letter with the beams. The logo font still needs a lot of
 small
  (but important) touches. For now I would like to get feedback mostly
 about
  the basic idea.
 
  Regards,
  Lukas
 
  On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote:
 
   Plenty left, but here is a template to get things started:
   http://wiki.apache.org/solr/LogoContest
  
Speaking of which, if we want to maintain the momentum of interest in
 this
   topic, someone (ie: not me) should setup a LogoContest wiki page
 with some
   of the goals discussed in the various threads on solr-user and
 solr-dev
   recently, as well as draft up some good guidelines for how we should
 run the
   contest
  
  
 
 
  --
  http://blog.lukas-vlcek.com/




-- 
http://blog.lukas-vlcek.com/


Re: shards and performance

2008-08-20 Thread Ian Connor
I have based my machines on bare bones servers (I call them ghetto
servers). I essentially have motherboards in a rack sitting on
catering trays (heat resistance is key).

http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html

Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
slots - allows as much cheap RAM as possible)
CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
if the different RAM approach works better and they are greener)
Memory: 8GB (4 x 2GB DDR2 - best price per GB)
HDD: SATA Disk (between 200 to 500GB - I had these from another project)

I have HAProxy between the App servers and Solr so that I get failover
if one of these goes down (expect failure).

Having only 1M documents but more data per document will mean your
situation is different. I am having particular performance issues with
facets and trying to get my head around all the issues involved there.

I see Mike has only 2 shards per box as he was squeezing
performance. I didn't see any significant gain in performance but that
is not to say there isn't one. Just for me, I had a level of
performance in mind and stopped when that was met. It took almost a
month of testing to get to that point so I was ready to move on to
other problems - I might revisit it later.

Also, my ghetto servers are getting similar reliability to the Dell
Servers I have - but I have built the system with the expectations
they will fail often although that has not happened yet.

On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
[EMAIL PROTECTED] wrote:
 As long as Solr/Lucene makes smart use from memory (and they from my
 experiences), it is really easy to calculate how long a huge query/update
 will take when you know how much the smaller ones will take. Just keep in
 mind that the resource consumption of memory and disk space is almost always
 proportional.

 2008/8/19 Mike Klaas [EMAIL PROTECTED]


 On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:


 So you experience differs from Mike's.  Obviously it's an important
 decision as to whether to buy more machines.  Can you (or Mike) weigh in on
 what factors led to your different take on local shards vs. shards
 distributed across machines?


 I do both; the only reason I have two shards on each machine is to squeeze
 maximum performance out of an equipment budget.  Err on the side of multiple
 machines.

  At least for building the index, the number of shards really does
 help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
 single machine starts at about 100doc/s but slows down to 10doc/s when
 the index grows. It seems as though the limit is reached once you run
 out of RAM and it gets slower and slower in a linear fashion the
 larger the index you get.
 My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
 data.


 Can you say what the specs were for these machines? Given that I have more
 like 1TB of data over 1M docs how do you think my machine requirements might
 be affected as compared to yours?


 You are in a much better position to determine this than we are.  See how
 big an index you can put on a single machine while maintaining acceptible
 performance using a typical query load.  It's relatively safe to extrapolate
 linearly from that.

 -Mike




 --
 Alexander Ramos Jardim




-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: IndexOutOfBoundsException

2008-08-20 Thread Ian Connor
It looks like it was just RAM. I purchased a PHD PCI2 to test all my
RAM from Ultra-X and some modules were just plain bad (some were bad
right away and others needed to warm up before failing - I will
testing all my RAM from now on).

I have re-index this many times since then and never seen the problem
since. So, it looks like it was just bad hardware - sorry about the
confusion.

On Mon, Aug 18, 2008 at 8:29 AM, Michael McCandless
[EMAIL PROTECTED] wrote:

 OK gotchya.  Please keep us posted one way or another...

 Mike

 Ian Connor wrote:

 Hi Mike,

 I am currently ruling out some bad memory modules. Knowing that this
 is a index corruption, makes memory corruption more likely. If
 replacing RAM does not fix the problem (which I need to do anyway due
 to segmentation faults), I will package up the crash into a
 reproducible scenario.

 On Mon, Aug 18, 2008 at 5:56 AM, Michael McCandless
 [EMAIL PROTECTED] wrote:

 Hi Ian,

 I sent this to java-user, but maybe you didn't see it, so let's try again
 on
 solr-user:


 It looks like your stored fields file (_X.fdt) is corrupt.

 Are you using multiple threads to add docs?

 Can you try switching to SerialMergeScheduler to verify it's
 reproducible?

 When you hit this exception, can you stop Solr and then run Lucene's
 CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the
 index is corrupt and see which segment it is?  Then post back the
 exception and ls -l of your index directory?

 If you could post the client-side code you're using to build  submit
 docs to Solr, and if I can get access to the Medline content, and I
 can the repro the bug, then I'll track it down...

 Mike

 On Aug 14, 2008, at 10:18 PM, Ian Connor wrote:

 I seem to be able to reproduce this very easily and the data is
 medline (so I am sure I can share it if needed with a quick email to
 check).

 - I am using fedora:
 %uname -a
 Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
 %java -version
 java version 1.7.0
 IcedTea Runtime Environment (build 1.7.0-b21)
 IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
 - single core (will use shards but each machine just as one HDD so
 didn't see how cores would help but I am new at this)
 - next run I will keep the output to check for earlier errors
 - very and I can share code + data if that will help

 On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 Yikes... not good.  This shouldn't be due to anything you did wrong
 Ian... it looks like a lucene bug.

 Some questions:
 - what platform are you running on, and what JVM?
 - are you using multicore? (I fixed some index locking bugs recently)
 - are there any exceptions in the log before this?
 - how reproducible is this?

 -Yonik

 On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor [EMAIL PROTECTED]
 wrote:

 Hi,

 I have rebuilt my index a few times (it should get up to about 4
 Million but around 1 Million it starts to fall apart).

 Exception in thread Lucene Merge Thread #0
 org.apache.lucene.index.MergePolicy$MergeException:
 java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
at

 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:323)
at

 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:300)
 Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
at java.util.ArrayList.rangeCheck(ArrayList.java:572)
at java.util.ArrayList.get(ArrayList.java:350)
at
 org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188)
at
 org.apache.lucene.index.SegmentReader.document(SegmentReader.java:670)
at

 org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:349)
at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134)
at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3998)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650)
at

 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:214)
at

 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:269)


 When this happens, the disk usage goes right up and the indexing
 really starts to slow down. I am using a Solr build from about a week
 ago - so my Lucene is at 2.4 according to the war files.

 Has anyone seen this error before? Is it possible to tell which Array
 is too large? Would it be an Array I am sending in or another internal
 one?

 Regards,
 Ian Connor





 --
 Regards,

 Ian Connor





 --
 Regards,

 Ian Connor





-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: Solr Logo thought

2008-08-20 Thread Grant Ingersoll
It's pretty good, for me.  My first thought is it is an eye (the  
orange reminds me of eyelashes), and then the second thought is it is  
the Sun. Take that w/ a grain of salt, though, there's a reason why I  
do server-side code and not user interfaces and graphic design. :-)


-Grant

On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote:


Hi,

Only few responded so far. How we can get more feedback? Do you  
think I
should work on the proposal a little bit more and then attach it to  
SOLR-84?


Regards,
Lukas

On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:


I like it, even its asymmetry. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Lukáš Vlček [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Sunday, August 17, 2008 7:02:25 PM
Subject: Re: Solr Logo thought

Hi,

My initial draft of Solr logo can be found here:
http://picasaweb.google.com/lukas.vlcek/Solr
The reason why I haven't attached it to SOLR-84 for now is that  
this is

just
draft and not final design (there are a lot of unfinished  
details). I

would

like to get some feedback before I spend more time on it.

I had several ideas but in the end I found that the simplicity works

best.
Simple font, sun motive, just two colors. Should look fine in both  
the

large
and small formats. As for the favicon I would use the sun motive  
only -

it
means the O letter with the beams. The logo font still needs a lot  
of

small

(but important) touches. For now I would like to get feedback mostly

about

the basic idea.

Regards,
Lukas

On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote:


Plenty left, but here is a template to get things started:
http://wiki.apache.org/solr/LogoContest

Speaking of which, if we want to maintain the momentum of  
interest in

this

topic, someone (ie: not me) should setup a LogoContest wiki page

with some

of the goals discussed in the various threads on solr-user and

solr-dev
recently, as well as draft up some good guidelines for how we  
should

run the

contest





Re: Solr Logo thought

2008-08-20 Thread Mark Miller
I went through the same thought process - it took a couple minutes for 
the whole thing to grow on me. Perhaps a tweak to the O if your looking 
for some constructive criticism?


Again though, I really think its an awesome multipurpose logo. Works 
well in color, b/w, large, small, and just the sun part as a facicon/other.


Grant Ingersoll wrote:
It's pretty good, for me.  My first thought is it is an eye (the 
orange reminds me of eyelashes), and then the second thought is it is 
the Sun. Take that w/ a grain of salt, though, there's a reason why I 
do server-side code and not user interfaces and graphic design. :-)


-Grant

On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote:


Hi,

Only few responded so far. How we can get more feedback? Do you think I
should work on the proposal a little bit more and then attach it to 
SOLR-84?


Regards,
Lukas

On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:


I like it, even its asymmetry. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Lukáš Vlček [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Sunday, August 17, 2008 7:02:25 PM
Subject: Re: Solr Logo thought

Hi,

My initial draft of Solr logo can be found here:
http://picasaweb.google.com/lukas.vlcek/Solr
The reason why I haven't attached it to SOLR-84 for now is that 
this is

just

draft and not final design (there are a lot of unfinished details). I

would

like to get some feedback before I spend more time on it.

I had several ideas but in the end I found that the simplicity works

best.

Simple font, sun motive, just two colors. Should look fine in both the

large
and small formats. As for the favicon I would use the sun motive 
only -

it

means the O letter with the beams. The logo font still needs a lot of

small

(but important) touches. For now I would like to get feedback mostly

about

the basic idea.

Regards,
Lukas

On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote:


Plenty left, but here is a template to get things started:
http://wiki.apache.org/solr/LogoContest

Speaking of which, if we want to maintain the momentum of interest in

this

topic, someone (ie: not me) should setup a LogoContest wiki page

with some

of the goals discussed in the various threads on solr-user and

solr-dev

recently, as well as draft up some good guidelines for how we should

run the

contest







Re: IndexOutOfBoundsException

2008-08-20 Thread Michael McCandless


OK glad to hear that ;)

Thanks for bringing closure, Ian!

Mike

Ian Connor wrote:


It looks like it was just RAM. I purchased a PHD PCI2 to test all my
RAM from Ultra-X and some modules were just plain bad (some were bad
right away and others needed to warm up before failing - I will
testing all my RAM from now on).

I have re-index this many times since then and never seen the problem
since. So, it looks like it was just bad hardware - sorry about the
confusion.

On Mon, Aug 18, 2008 at 8:29 AM, Michael McCandless
[EMAIL PROTECTED] wrote:


OK gotchya.  Please keep us posted one way or another...

Mike

Ian Connor wrote:


Hi Mike,

I am currently ruling out some bad memory modules. Knowing that this
is a index corruption, makes memory corruption more likely. If
replacing RAM does not fix the problem (which I need to do anyway  
due

to segmentation faults), I will package up the crash into a
reproducible scenario.

On Mon, Aug 18, 2008 at 5:56 AM, Michael McCandless
[EMAIL PROTECTED] wrote:


Hi Ian,

I sent this to java-user, but maybe you didn't see it, so let's  
try again

on
solr-user:


It looks like your stored fields file (_X.fdt) is corrupt.

Are you using multiple threads to add docs?

Can you try switching to SerialMergeScheduler to verify it's
reproducible?

When you hit this exception, can you stop Solr and then run  
Lucene's

CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the
index is corrupt and see which segment it is?  Then post back the
exception and ls -l of your index directory?

If you could post the client-side code you're using to build   
submit

docs to Solr, and if I can get access to the Medline content, and I
can the repro the bug, then I'll track it down...

Mike

On Aug 14, 2008, at 10:18 PM, Ian Connor wrote:


I seem to be able to reproduce this very easily and the data is
medline (so I am sure I can share it if needed with a quick  
email to

check).

- I am using fedora:
%uname -a
Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
%java -version
java version 1.7.0
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
- single core (will use shards but each machine just as one HDD so
didn't see how cores would help but I am new at this)
- next run I will keep the output to check for earlier errors
- very and I can share code + data if that will help

On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley [EMAIL PROTECTED]  
wrote:


Yikes... not good.  This shouldn't be due to anything you did  
wrong

Ian... it looks like a lucene bug.

Some questions:
- what platform are you running on, and what JVM?
- are you using multicore? (I fixed some index locking bugs  
recently)

- are there any exceptions in the log before this?
- how reproducible is this?

-Yonik

On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor  
[EMAIL PROTECTED]

wrote:


Hi,

I have rebuilt my index a few times (it should get up to about 4
Million but around 1 Million it starts to fall apart).

Exception in thread Lucene Merge Thread #0
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
  at

org 
.apache 
.lucene 
.index 
.ConcurrentMergeScheduler 
.handleMergeException(ConcurrentMergeScheduler.java:323)

  at

org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:300)
Caused by: java.lang.IndexOutOfBoundsException: Index: 105,  
Size: 33

  at java.util.ArrayList.rangeCheck(ArrayList.java:572)
  at java.util.ArrayList.get(ArrayList.java:350)
  at
org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java: 
260)
  at  
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188)

  at
org 
.apache.lucene.index.SegmentReader.document(SegmentReader.java: 
670)

  at

org 
.apache 
.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:349)

  at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java: 
134)

  at
org 
.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
3998)
  at  
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650)

  at

org 
.apache 
.lucene 
.index 
.ConcurrentMergeScheduler 
.doMerge(ConcurrentMergeScheduler.java:214)

  at

org.apache.lucene.index.ConcurrentMergeScheduler 
$MergeThread.run(ConcurrentMergeScheduler.java:269)



When this happens, the disk usage goes right up and the indexing
really starts to slow down. I am using a Solr build from about  
a week

ago - so my Lucene is at 2.4 according to the war files.

Has anyone seen this error before? Is it possible to tell  
which Array
is too large? Would it be an Array I am sending in or another  
internal

one?

Regards,
Ian Connor







--
Regards,

Ian Connor







--
Regards,

Ian Connor







--
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209

Re: Solr Logo thought

2008-08-20 Thread Lukáš Vlček
Hi,

One lesson learned from Mahout logo is that we should also check whether
this design is unique enough. For example one of the early design concept of
Mahout logo was found to be very close to existing logo for some machine
learning related project (though such thing is not very probable it
happend). As for the sun motive there are tons of logos having sun in it so
we should be careful...

Anyway, I will try to work with the O and will kepp you posted once I have
some results. In the meantime we can still collect constructive criticism
here.

Regards,
Lukas

On Wed, Aug 20, 2008 at 2:25 PM, Mark Miller [EMAIL PROTECTED] wrote:

 I went through the same thought process - it took a couple minutes for the
 whole thing to grow on me. Perhaps a tweak to the O if your looking for some
 constructive criticism?

 Again though, I really think its an awesome multipurpose logo. Works well
 in color, b/w, large, small, and just the sun part as a facicon/other.


 Grant Ingersoll wrote:

 It's pretty good, for me.  My first thought is it is an eye (the orange
 reminds me of eyelashes), and then the second thought is it is the Sun. Take
 that w/ a grain of salt, though, there's a reason why I do server-side code
 and not user interfaces and graphic design. :-)

 -Grant

 On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote:

  Hi,

 Only few responded so far. How we can get more feedback? Do you think I
 should work on the proposal a little bit more and then attach it to
 SOLR-84?

 Regards,
 Lukas

 On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic 
 [EMAIL PROTECTED] wrote:

  I like it, even its asymmetry. :)


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 

 From: Lukáš Vlček [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Sunday, August 17, 2008 7:02:25 PM
 Subject: Re: Solr Logo thought

 Hi,

 My initial draft of Solr logo can be found here:
 http://picasaweb.google.com/lukas.vlcek/Solr
 The reason why I haven't attached it to SOLR-84 for now is that this is

 just

 draft and not final design (there are a lot of unfinished details). I

 would

 like to get some feedback before I spend more time on it.

 I had several ideas but in the end I found that the simplicity works

 best.

 Simple font, sun motive, just two colors. Should look fine in both the

 large

 and small formats. As for the favicon I would use the sun motive only -

 it

 means the O letter with the beams. The logo font still needs a lot of

 small

 (but important) touches. For now I would like to get feedback mostly

 about

 the basic idea.

 Regards,
 Lukas

 On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote:

  Plenty left, but here is a template to get things started:
 http://wiki.apache.org/solr/LogoContest

 Speaking of which, if we want to maintain the momentum of interest in

 this

 topic, someone (ie: not me) should setup a LogoContest wiki page

 with some

 of the goals discussed in the various threads on solr-user and

 solr-dev

 recently, as well as draft up some good guidelines for how we should

 run the

 contest






-- 
http://blog.lukas-vlcek.com/


Re: solr-ruby version management

2008-08-20 Thread Koji Sekiguchi

Otis,

Thanks for the comment. I used dot instead of dash because my associate 
(Rubyist) said
that if I use 1.3-0.0.7 style version, I'll get an error when making a 
gem (I've never tried this):


ERROR:   While executing gem (ArgumentError)
  /usr/local/lib/ruby/site_ruby/1.8/rubygems/version.rb:56:in 
`initialize': Malformed version number string 1-3-0.0.7


But on second thought, I want to stick with the current style version. 
That is,

solr-ruby-0.0.6.gem. :)

If we release solr-ruby-1.3.0.1.gem and after that, Solr 1.4 will be 
released.

At thiat time, solr-ruby can never be renewed because solr-ruby-1.3 should
work fine with Solr 1.4. But once if we release solr-ruby-1.3.0.x.gem,
we should release solr-ruby-1.4.0.1.gem for the checkpoint purpose.
I don't want to do that.

Koji

Otis Gospodnetic wrote:

I like this idea.  Perhaps separate the solr version and the solr-ruby version 
with a dash instead of dot -- solr-ruby-1.3.0-0.0.6

 
Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Koji Sekiguchi [EMAIL PROTECTED]
To: solr-user@lucene.apache.org; [EMAIL PROTECTED]
Sent: Tuesday, August 19, 2008 4:24:31 AM
Subject: solr-ruby version management

From: http://www.nabble.com/CHANGES.txt-td18901774.html

The latest version of solr-ruby is 0.0.6:

solr-ruby-0.0.6.gem
http://rubyforge.org/frs/?group_id=2875release_id=23885

I think it isn't clear what Solr version is corresponding.

I'd like to change this to solr-ruby-{solrVersion}.{solr-rubyVersion}.gem
when Solr 1.3 is released. Where solr-rubyVersion is two digits.
That is, the first official release of solr-ruby will be
solr-ruby-1.3.0.01.gem.

Any objections to changing to this new version format?
Or anyone who has suggestions, please let me know.

Koji




  




Re: shards and performance

2008-08-20 Thread Alexander Ramos Jardim
Another thing to consider on your sharding is the access rate you want to
guarantee.

In the project I am working, I need to guarantee at least 200hits/second
with various facets in all queries.

I am not using sharding, but I have 6 Solr instances per cluster node, and I
have 3 nodes, to a total of 18 solr instances. Each node has only one index,
so I keep the 6 instance pointing to the same the index in a given node.
What made a huge diference in my performance was the removal of the lock.

I expect that helps you out.

2008/8/20 Ian Connor [EMAIL PROTECTED]

 I have based my machines on bare bones servers (I call them ghetto
 servers). I essentially have motherboards in a rack sitting on
 catering trays (heat resistance is key).

 http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html

 Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
 slots - allows as much cheap RAM as possible)
 CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
 if the different RAM approach works better and they are greener)
 Memory: 8GB (4 x 2GB DDR2 - best price per GB)
 HDD: SATA Disk (between 200 to 500GB - I had these from another project)

 I have HAProxy between the App servers and Solr so that I get failover
 if one of these goes down (expect failure).

 Having only 1M documents but more data per document will mean your
 situation is different. I am having particular performance issues with
 facets and trying to get my head around all the issues involved there.

 I see Mike has only 2 shards per box as he was squeezing
 performance. I didn't see any significant gain in performance but that
 is not to say there isn't one. Just for me, I had a level of
 performance in mind and stopped when that was met. It took almost a
 month of testing to get to that point so I was ready to move on to
 other problems - I might revisit it later.

 Also, my ghetto servers are getting similar reliability to the Dell
 Servers I have - but I have built the system with the expectations
 they will fail often although that has not happened yet.

 On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
 [EMAIL PROTECTED] wrote:
  As long as Solr/Lucene makes smart use from memory (and they from my
  experiences), it is really easy to calculate how long a huge query/update
  will take when you know how much the smaller ones will take. Just keep in
  mind that the resource consumption of memory and disk space is almost
 always
  proportional.
 
  2008/8/19 Mike Klaas [EMAIL PROTECTED]
 
 
  On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
 
 
  So you experience differs from Mike's.  Obviously it's an important
  decision as to whether to buy more machines.  Can you (or Mike) weigh
 in on
  what factors led to your different take on local shards vs. shards
  distributed across machines?
 
 
  I do both; the only reason I have two shards on each machine is to
 squeeze
  maximum performance out of an equipment budget.  Err on the side of
 multiple
  machines.
 
   At least for building the index, the number of shards really does
  help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
  single machine starts at about 100doc/s but slows down to 10doc/s when
  the index grows. It seems as though the limit is reached once you run
  out of RAM and it gets slower and slower in a linear fashion the
  larger the index you get.
  My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
  data.
 
 
  Can you say what the specs were for these machines? Given that I have
 more
  like 1TB of data over 1M docs how do you think my machine requirements
 might
  be affected as compared to yours?
 
 
  You are in a much better position to determine this than we are.  See
 how
  big an index you can put on a single machine while maintaining
 acceptible
  performance using a typical query load.  It's relatively safe to
 extrapolate
  linearly from that.
 
  -Mike
 
 
 
 
  --
  Alexander Ramos Jardim
 



 --
 Regards,

 Ian Connor
 1 Leighton St #605
 Cambridge, MA 02141
 Direct Line: +1 (978) 672
 Call Center Phone: +1 (714) 239 3875 (24 hrs)
 Mobile Phone: +1 (312) 218 3209
 Fax: +1(770) 818 5697
 Suisse Phone: +41 (0) 22 548 1664
 Skype: ian.connor




-- 
Alexander Ramos Jardim


Re: shards and performance

2008-08-20 Thread Ian Connor
So, because the OS is doing the caching in RAM. It means I could have
6 jetty servers per machine all pointing to the same data. Once the
index is built, I can load up some more servers on different ports and
it will boost performance.

That does sound promising - thanks for the tip. What made you pick 6?

On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim
[EMAIL PROTECTED] wrote:
 Another thing to consider on your sharding is the access rate you want to
 guarantee.

 In the project I am working, I need to guarantee at least 200hits/second
 with various facets in all queries.

 I am not using sharding, but I have 6 Solr instances per cluster node, and I
 have 3 nodes, to a total of 18 solr instances. Each node has only one index,
 so I keep the 6 instance pointing to the same the index in a given node.
 What made a huge diference in my performance was the removal of the lock.

 I expect that helps you out.

 2008/8/20 Ian Connor [EMAIL PROTECTED]

 I have based my machines on bare bones servers (I call them ghetto
 servers). I essentially have motherboards in a rack sitting on
 catering trays (heat resistance is key).

 http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html

 Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
 slots - allows as much cheap RAM as possible)
 CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
 if the different RAM approach works better and they are greener)
 Memory: 8GB (4 x 2GB DDR2 - best price per GB)
 HDD: SATA Disk (between 200 to 500GB - I had these from another project)

 I have HAProxy between the App servers and Solr so that I get failover
 if one of these goes down (expect failure).

 Having only 1M documents but more data per document will mean your
 situation is different. I am having particular performance issues with
 facets and trying to get my head around all the issues involved there.

 I see Mike has only 2 shards per box as he was squeezing
 performance. I didn't see any significant gain in performance but that
 is not to say there isn't one. Just for me, I had a level of
 performance in mind and stopped when that was met. It took almost a
 month of testing to get to that point so I was ready to move on to
 other problems - I might revisit it later.

 Also, my ghetto servers are getting similar reliability to the Dell
 Servers I have - but I have built the system with the expectations
 they will fail often although that has not happened yet.

 On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
 [EMAIL PROTECTED] wrote:
  As long as Solr/Lucene makes smart use from memory (and they from my
  experiences), it is really easy to calculate how long a huge query/update
  will take when you know how much the smaller ones will take. Just keep in
  mind that the resource consumption of memory and disk space is almost
 always
  proportional.
 
  2008/8/19 Mike Klaas [EMAIL PROTECTED]
 
 
  On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
 
 
  So you experience differs from Mike's.  Obviously it's an important
  decision as to whether to buy more machines.  Can you (or Mike) weigh
 in on
  what factors led to your different take on local shards vs. shards
  distributed across machines?
 
 
  I do both; the only reason I have two shards on each machine is to
 squeeze
  maximum performance out of an equipment budget.  Err on the side of
 multiple
  machines.
 
   At least for building the index, the number of shards really does
  help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
  single machine starts at about 100doc/s but slows down to 10doc/s when
  the index grows. It seems as though the limit is reached once you run
  out of RAM and it gets slower and slower in a linear fashion the
  larger the index you get.
  My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
  data.
 
 
  Can you say what the specs were for these machines? Given that I have
 more
  like 1TB of data over 1M docs how do you think my machine requirements
 might
  be affected as compared to yours?
 
 
  You are in a much better position to determine this than we are.  See
 how
  big an index you can put on a single machine while maintaining
 acceptible
  performance using a typical query load.  It's relatively safe to
 extrapolate
  linearly from that.
 
  -Mike
 
 
 
 
  --
  Alexander Ramos Jardim
 



 --
 Regards,

 Ian Connor
 1 Leighton St #605
 Cambridge, MA 02141
 Direct Line: +1 (978) 672
 Call Center Phone: +1 (714) 239 3875 (24 hrs)
 Mobile Phone: +1 (312) 218 3209
 Fax: +1(770) 818 5697
 Suisse Phone: +41 (0) 22 548 1664
 Skype: ian.connor




 --
 Alexander Ramos Jardim




-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: hello, a question about solr.

2008-08-20 Thread Alexander Ramos Jardim
A tiny but really explanation can be found here
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

2008/8/18 finy finy [EMAIL PROTECTED]

 thanks for your help.

 could you give me your gmail talk address or msn?


 2008/8/19, Norberto Meijome [EMAIL PROTECTED]:
 
  On Mon, 18 Aug 2008 23:07:19 +0800
  finy finy [EMAIL PROTECTED] wrote:
 
   because i use chinese character, for example ibm___
   solr will parse it into a term ibm and a phraze _ __
   can i use solr to query with a term ibm and a term _  and a
  term __?
 
  Hi finy,
  you should look into n-gram tokenizers. Not sure if it is documented in
 the
  wiki, but it has been discussed in the mailing list quite a few times.
 
  in short, an n-gram tokenizer breaks your input into blocks of characters
  of size n , which are then used to compare in the index. I think for
 Chinese
  , bi-gram is the favoured approach.
 
  good luck,
  B
  _
  {Beto|Norberto|Numard} Meijome
 
  I used to hate weddings; all the Grandmas would poke me and
  say, You're next sonny! They stopped doing that when i
  started to do it to them at funerals.
 
  I speak for myself, not my employer. Contents may be hot. Slippery when
  wet. Reading disclaimers makes you go blind. Writing them is worse. You
 have
  been Warned.
 




-- 
Alexander Ramos Jardim


Re: localsolr and dataimport problems

2008-08-20 Thread TomWilliamson

Hi Shalin, 

I've compiled a new apache-solr-dataimporthandler-1.3-dev.jar from the trunk
without issues and dropped this into my existing WAR, but I'm now receiving

java.lang.NoSuchMethodError:
org.apache.solr.request.SolrQueryResponse.setHttpCaching(Z)V

when I try and access http://localhost:8983/solr/dataimport

Many thanks,
Tom





Shalin Shekhar Mangar wrote:
 
 Hi Tom,
 
 This should be fixed in the trunk code.
 
 On Thu, Aug 7, 2008 at 12:13 AM, Shalin Shekhar Mangar 
 [EMAIL PROTECTED] wrote:
 
 My mistake, it is SOLR-676.

 https://issues.apache.org/jira/browse/SOLR-676

 On Thu, Aug 7, 2008 at 12:13 AM, Shalin Shekhar Mangar 
 [EMAIL PROTECTED] wrote:

 I've opened SOLR-647 to fix this.

 https://issues.apache.org/jira/browse/SOLR-676


 On Wed, Aug 6, 2008 at 9:56 PM, Shalin Shekhar Mangar 
 [EMAIL PROTECTED] wrote:

 On Wed, Aug 6, 2008 at 5:50 PM, TomWilliamson 
 [EMAIL PROTECTED] wrote:


 I'm trying to use localsolr (1.5) with the dataimport handler but not
 having
 much luck. The issue seems to be that the _localtier dynamic fields
 dont
 get
 generated when adding docs via the dataimport, although they do if I
 add
 docs via post.jar (xml document). Am i missing something simple here?
 --
 View this message in context:
 http://www.nabble.com/localsolr-and-dataimport-problems-tp18849983p18849983.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 Hmm... LocalSolr uses the UpdateRequestProcessor API in Solr to add
 it's
 dynamic fields. However, DataImportHandler bypasses that API and adds
 documents directly to the UpdateHandler.

 Sounds like an improvement (bug?) in DataImportHandler which should be
 addressed before release. I'll open an issue and work on it.

 --
 Regards,
 Shalin Shekhar Mangar.




 --
 Regards,
 Shalin Shekhar Mangar.




 --
 Regards,
 Shalin Shekhar Mangar.

 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/localsolr-and-dataimport-problems-tp18849983p19069927.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: shards and performance

2008-08-20 Thread Alexander Ramos Jardim
2008/8/20 Ian Connor [EMAIL PROTECTED]

 So, because the OS is doing the caching in RAM. It means I could have
 6 jetty servers per machine all pointing to the same data. Once the
 index is built, I can load up some more servers on different ports and
 it will boost performance.

 That does sound promising - thanks for the tip. What made you pick 6?


Each weblogic instance sits on top of a 2GB heap size JVM. Each cluster node
has 16GB RAM.



 On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim
 [EMAIL PROTECTED] wrote:
  Another thing to consider on your sharding is the access rate you want to
  guarantee.
 
  In the project I am working, I need to guarantee at least 200hits/second
  with various facets in all queries.
 
  I am not using sharding, but I have 6 Solr instances per cluster node,
 and I
  have 3 nodes, to a total of 18 solr instances. Each node has only one
 index,
  so I keep the 6 instance pointing to the same the index in a given node.
  What made a huge diference in my performance was the removal of the lock.
 
  I expect that helps you out.
 
  2008/8/20 Ian Connor [EMAIL PROTECTED]
 
  I have based my machines on bare bones servers (I call them ghetto
  servers). I essentially have motherboards in a rack sitting on
  catering trays (heat resistance is key).
 
  http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html
 
  Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
  slots - allows as much cheap RAM as possible)
  CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
  if the different RAM approach works better and they are greener)
  Memory: 8GB (4 x 2GB DDR2 - best price per GB)
  HDD: SATA Disk (between 200 to 500GB - I had these from another project)
 
  I have HAProxy between the App servers and Solr so that I get failover
  if one of these goes down (expect failure).
 
  Having only 1M documents but more data per document will mean your
  situation is different. I am having particular performance issues with
  facets and trying to get my head around all the issues involved there.
 
  I see Mike has only 2 shards per box as he was squeezing
  performance. I didn't see any significant gain in performance but that
  is not to say there isn't one. Just for me, I had a level of
  performance in mind and stopped when that was met. It took almost a
  month of testing to get to that point so I was ready to move on to
  other problems - I might revisit it later.
 
  Also, my ghetto servers are getting similar reliability to the Dell
  Servers I have - but I have built the system with the expectations
  they will fail often although that has not happened yet.
 
  On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
  [EMAIL PROTECTED] wrote:
   As long as Solr/Lucene makes smart use from memory (and they from my
   experiences), it is really easy to calculate how long a huge
 query/update
   will take when you know how much the smaller ones will take. Just keep
 in
   mind that the resource consumption of memory and disk space is almost
  always
   proportional.
  
   2008/8/19 Mike Klaas [EMAIL PROTECTED]
  
  
   On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
  
  
   So you experience differs from Mike's.  Obviously it's an important
   decision as to whether to buy more machines.  Can you (or Mike)
 weigh
  in on
   what factors led to your different take on local shards vs. shards
   distributed across machines?
  
  
   I do both; the only reason I have two shards on each machine is to
  squeeze
   maximum performance out of an equipment budget.  Err on the side of
  multiple
   machines.
  
At least for building the index, the number of shards really does
   help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
   single machine starts at about 100doc/s but slows down to 10doc/s
 when
   the index grows. It seems as though the limit is reached once you
 run
   out of RAM and it gets slower and slower in a linear fashion the
   larger the index you get.
   My sweet spot was 5 machines with 8GB RAM for indexing about 60GB
 of
   data.
  
  
   Can you say what the specs were for these machines? Given that I
 have
  more
   like 1TB of data over 1M docs how do you think my machine
 requirements
  might
   be affected as compared to yours?
  
  
   You are in a much better position to determine this than we are.  See
  how
   big an index you can put on a single machine while maintaining
  acceptible
   performance using a typical query load.  It's relatively safe to
  extrapolate
   linearly from that.
  
   -Mike
  
  
  
  
   --
   Alexander Ramos Jardim
  
 
 
 
  --
  Regards,
 
  Ian Connor
  1 Leighton St #605
  Cambridge, MA 02141
  Direct Line: +1 (978) 672
  Call Center Phone: +1 (714) 239 3875 (24 hrs)
  Mobile Phone: +1 (312) 218 3209
  Fax: +1(770) 818 5697
  Suisse Phone: +41 (0) 22 548 1664
  Skype: ian.connor
 
 
 
 
  --
  Alexander Ramos Jardim
 



 --
 Regards,

 Ian Connor
 1 Leighton St #605
 

Re: localsolr and dataimport problems

2008-08-20 Thread Shalin Shekhar Mangar
Hi Tom,

Solr internals have seen some major overhauls recently and this method was
added a month back. I suggest that you try with the trunk solr war file. You
may be using an older nightly build.

I'm wondering if LocalSolr has kept up with the changes in the
UpdateProcessor API.

On Wed, Aug 20, 2008 at 7:50 PM, TomWilliamson [EMAIL PROTECTED]
 wrote:


 Hi Shalin,

 I've compiled a new apache-solr-dataimporthandler-1.3-dev.jar from the
 trunk
 without issues and dropped this into my existing WAR, but I'm now receiving

 java.lang.NoSuchMethodError:
 org.apache.solr.request.SolrQueryResponse.setHttpCaching(Z)V

 when I try and access http://localhost:8983/solr/dataimport

 Many thanks,
 Tom
 --
 View this message in context:
 http://www.nabble.com/localsolr-and-dataimport-problems-tp18849983p19069927.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Testing query response time

2008-08-20 Thread Phillip Farber



I would like to test query response time for a set of queries.  I'm not 
interested in capacity Q/sec, just response time.  My queries will be 
against an index of OCR'd books so in the real world every query is 
probably unique and impossible to predict so I don't see a way to 
prewarm any of the caches.  I'm not sorting.  I'm not faceting. I'm 
querying on a few fields like title, author, subject and date in a range.


Regarding initial conditions, it seems that there's no useful state into 
which I can put the caches.  Would the best approach be to run the 
queries from a cold solr startup?


What about OS disk caches?  I can see two arguments.  One, just to test 
solr the disk caches should be empty. On the other hand, realistically, 
the disk caches would be full so that argues for executing enough 
queries to load those and then redo the query set (with empty solr caches).


Speaking of empty solr caches, is there a way to flush those while solr 
is running?


What other system states do I need to control for to get a handle on 
response time?


Thanks and regards,

Phil
--
Phillip Farber - http://www.umdl.umich.edu


Re: Solr Logo thought

2008-08-20 Thread Mike Klaas
Nice job Lukas; the professionalism and quality of work is evident.  I  
like aspects of the logo, but too am having trouble getting past the  
eye-looking O.  Is it intentional (eye:look:search, etc)?


-Mike

On 20-Aug-08, at 5:25 AM, Mark Miller wrote:

I went through the same thought process - it took a couple minutes  
for the whole thing to grow on me. Perhaps a tweak to the O if your  
looking for some constructive criticism?


Again though, I really think its an awesome multipurpose logo. Works  
well in color, b/w, large, small, and just the sun part as a facicon/ 
other.


Grant Ingersoll wrote:
It's pretty good, for me.  My first thought is it is an eye (the  
orange reminds me of eyelashes), and then the second thought is it  
is the Sun. Take that w/ a grain of salt, though, there's a reason  
why I do server-side code and not user interfaces and graphic  
design. :-)


-Grant

On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote:


Hi,

Only few responded so far. How we can get more feedback? Do you  
think I
should work on the proposal a little bit more and then attach it  
to SOLR-84?


Regards,
Lukas

On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:


I like it, even its asymmetry. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Lukáš Vlček [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Sunday, August 17, 2008 7:02:25 PM
Subject: Re: Solr Logo thought

Hi,

My initial draft of Solr logo can be found here:
http://picasaweb.google.com/lukas.vlcek/Solr
The reason why I haven't attached it to SOLR-84 for now is that  
this is

just
draft and not final design (there are a lot of unfinished  
details). I

would

like to get some feedback before I spend more time on it.

I had several ideas but in the end I found that the simplicity  
works

best.
Simple font, sun motive, just two colors. Should look fine in  
both the

large
and small formats. As for the favicon I would use the sun motive  
only -

it
means the O letter with the beams. The logo font still needs a  
lot of

small
(but important) touches. For now I would like to get feedback  
mostly

about

the basic idea.

Regards,
Lukas

On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote:


Plenty left, but here is a template to get things started:
http://wiki.apache.org/solr/LogoContest

Speaking of which, if we want to maintain the momentum of  
interest in

this
topic, someone (ie: not me) should setup a LogoContest wiki  
page

with some

of the goals discussed in the various threads on solr-user and

solr-dev
recently, as well as draft up some good guidelines for how we  
should

run the

contest









Re: Solr Logo thought

2008-08-20 Thread Lukáš Vlček
Hi,

Well, the eye looking O is not intentional. It is more a result of the
techique I used when doing the initial skatch. Believe it or not this design
started at magnetic drawing board (http://www.reggies.co.za/nov/nov274.jpg)
which I use now when playing with my 2 year old daughter. It is an excellent
piece of hardware and though it lacks in terms of output resolution and is
not presure sensitive and its undo capatilities are very limitted it
outperforms my A4+ Wacom tablet in terms of bootup time and is absolutely
*green energy* equipment. But as I mentioned, its resolution is quite low
and thus the vectorized version is not perfect yet.

Anyway, I think that eye looking O is an interesting observation I will work
on this because I see that it can be confuzing.

Regards,
Lukas

On Wed, Aug 20, 2008 at 9:01 PM, Mike Klaas [EMAIL PROTECTED] wrote:

 Nice job Lukas; the professionalism and quality of work is evident.  I like
 aspects of the logo, but too am having trouble getting past the eye-looking
 O.  Is it intentional (eye:look:search, etc)?

 -Mike


 On 20-Aug-08, at 5:25 AM, Mark Miller wrote:

  I went through the same thought process - it took a couple minutes for the
 whole thing to grow on me. Perhaps a tweak to the O if your looking for some
 constructive criticism?

 Again though, I really think its an awesome multipurpose logo. Works well
 in color, b/w, large, small, and just the sun part as a facicon/other.

 Grant Ingersoll wrote:

 It's pretty good, for me.  My first thought is it is an eye (the orange
 reminds me of eyelashes), and then the second thought is it is the Sun. Take
 that w/ a grain of salt, though, there's a reason why I do server-side code
 and not user interfaces and graphic design. :-)

 -Grant

 On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote:

  Hi,

 Only few responded so far. How we can get more feedback? Do you think I
 should work on the proposal a little bit more and then attach it to
 SOLR-84?

 Regards,
 Lukas

 On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic 
 [EMAIL PROTECTED] wrote:

  I like it, even its asymmetry. :)


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 

 From: Lukáš Vlček [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Sunday, August 17, 2008 7:02:25 PM
 Subject: Re: Solr Logo thought

 Hi,

 My initial draft of Solr logo can be found here:
 http://picasaweb.google.com/lukas.vlcek/Solr
 The reason why I haven't attached it to SOLR-84 for now is that this
 is

 just

 draft and not final design (there are a lot of unfinished details). I

 would

 like to get some feedback before I spend more time on it.

 I had several ideas but in the end I found that the simplicity works

 best.

 Simple font, sun motive, just two colors. Should look fine in both the

 large

 and small formats. As for the favicon I would use the sun motive only
 -

 it

 means the O letter with the beams. The logo font still needs a lot of

 small

 (but important) touches. For now I would like to get feedback mostly

 about

 the basic idea.

 Regards,
 Lukas

 On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote:

  Plenty left, but here is a template to get things started:
 http://wiki.apache.org/solr/LogoContest

 Speaking of which, if we want to maintain the momentum of interest in

 this

 topic, someone (ie: not me) should setup a LogoContest wiki page

 with some

 of the goals discussed in the various threads on solr-user and

 solr-dev

 recently, as well as draft up some good guidelines for how we should

 run the

 contest







-- 
http://blog.lukas-vlcek.com/


DIH - Document missing required field error

2008-08-20 Thread Todd Breiholz
I am testing out DataImportHandler (Nightly 08-18-2008) in our environment
with a simple schema. One database view with 10 columns mapped to an  index
in Solr. When running the full-import command, I get the error below for
each row. I'm sure I'm missing something, but I can't figure it out.

WARNING: Error creating document : SolrInputDocumnt[{}]
org.apache.solr.common.SolrException: Document [null] missing required
field: id
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:176)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:134)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:332)
at
org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:384)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1188)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Here is my data-config.xml:

dataConfig
dataSource type=JdbcDataSource
driver=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:oci:@orcl user=... password=.../
document name=sharemy
entity name=gallery pk=mygallery_id
query=select * from mdp_share_my_search_view
field column=mygallery_id name=id/
field column=mygallrychanl_id name=channelId/
field column=mygallrycat_id name=categoryId/
field column=user_id name=userId/
field column=mygallry_name name=galleryName/
field column=photo_names name=photoNames/
field column=photo_descs name=photoDescriptions/
field column=hit_count name=hitCount/
field column=vote_count name=voteCount/
field column=rating name=rating/
field column=photo_tags name=photoTags/
/entity
/document
/dataConfig

and from my schema.xml:

 fields
   field name=id type=string indexed=true stored=true
required=true /
   field name=channelId type=string indexed=true stored=true
omitNorms=true/
   field name=categoryId type=string indexed=true stored=true
omitNorms=true/
   field name=userId type=string indexed=true stored=true
omitNorms=true/
   field name=galleryName type=text indexed=true stored=true/
   field name=photoNames type=text indexed=true stored=true/
   field name=photoDescriptions type=text indexed=true
stored=true/
   field name=hitCount type=sint indexed=true stored=true/
   field name=voteCount type=sint indexed=true stored=true/
   field name=rating type=sfloat indexed=true stored=true/
   field name=photoTags type=text indexed=true stored=true/
   field name=allText type=text indexed=true stored=false
multiValued=true/
/fields
uniqueKeyid/uniqueKey


Recognizing date inputs

2008-08-20 Thread Jon Baer

Hi,

(Im sure this was asked before but found nothing on markmail) ...  
Wondering if Solr can handle this on its own or if something needs to  
be written ... would like to handle recognizing date inputs to a  
search box for news articles, items such as August 1,August 1st or  
08/01/2008 ... its a bit different than synonym like handling in  
that I have to transform the query to a specific field.  Any thoughts?


Thanks.

- Jon


How to boost the score higher in case user query matches entire field value than just some words within a field

2008-08-20 Thread Simon Hu

Hi

I have a text field named prodname in the solr index. Lets say there are 3
document in the index and  here are the field values for prodname field:

Doc1: cordless drill
Doc2: cordless drill battery
Doc3: cordless drill charger 

Searching for prodname:cordless drill will hit all three documents.  So
how can I make Doc1 score higher than the other two? 

BTW, I am using solr1.2. 

thanks! 

-Simon 

-- 
View this message in context: 
http://www.nabble.com/How-to-boost-the-score-higher-in-case-user-query-matches-entire-field-value-than-just-some-words-within-a-field-tp19079221p19079221.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Auto commit error and java.io.FileNotFoundException

2008-08-20 Thread Chris Harris
Ok, I did what you suggested, giving each SolrIndexWriter its own
infoStream log file, created in the init() method. The thing is, I
now have like 3400 infostream log files, I guess reflecting how solr
created like 3400 SolrIndexWriters over the course of the run.
(Hopefully this is plausible.) Could you explain what I should be
looking for in these files? (Posting the whole bunch of it doesn't
sound very useful.)

Thanks,
Chris

On Mon, Aug 18, 2008 at 10:12 AM, Michael McCandless
[EMAIL PROTECTED] wrote:

 Alas, I think this won't actually turn on IndexWriter's infoStream.

 I think you may need to modify the SolrIndexWriter.java sources, in the init
 method, to add a call to setInfoStream(...).

 Can any Solr developers confirm this?

 Mike

 Chris Harris wrote:

 I'm assuming that one way to do this would be to set the logging level
 to FINEST in the logging page in the solr admin tool, and then to
 make sure my logging.properties file is also set to record the FINEST
 logging level. Let me know if that won't enable to sort of debugging
 info you are talking about. (I do understand that the logging page in
 the admin tool makes temporary changes that will get reverted when you
 restart Solr.)

 On Mon, Aug 18, 2008 at 3:05 AM, Michael McCandless
 [EMAIL PROTECTED] wrote:

 Since it seems reproducible, could you turn on debugging output
 (IndexWriter.setInfoStream(...)), get the FileNotFoundException to happen
 again, and post the resulting output?

 Mike




Re: Auto commit error and java.io.FileNotFoundException

2008-08-20 Thread Michael McCandless


Did the same FileNotFoundException / massive deletion of files occur?

Actually if you could zip them all up and post them, I'll dig through  
them to see if they give any clues...


Mike

Chris Harris wrote:


Ok, I did what you suggested, giving each SolrIndexWriter its own
infoStream log file, created in the init() method. The thing is, I
now have like 3400 infostream log files, I guess reflecting how solr
created like 3400 SolrIndexWriters over the course of the run.
(Hopefully this is plausible.) Could you explain what I should be
looking for in these files? (Posting the whole bunch of it doesn't
sound very useful.)

Thanks,
Chris

On Mon, Aug 18, 2008 at 10:12 AM, Michael McCandless
[EMAIL PROTECTED] wrote:


Alas, I think this won't actually turn on IndexWriter's infoStream.

I think you may need to modify the SolrIndexWriter.java sources, in  
the init

method, to add a call to setInfoStream(...).

Can any Solr developers confirm this?

Mike

Chris Harris wrote:

I'm assuming that one way to do this would be to set the logging  
level
to FINEST in the logging page in the solr admin tool, and then  
to
make sure my logging.properties file is also set to record the  
FINEST

logging level. Let me know if that won't enable to sort of debugging
info you are talking about. (I do understand that the logging page  
in
the admin tool makes temporary changes that will get reverted when  
you

restart Solr.)

On Mon, Aug 18, 2008 at 3:05 AM, Michael McCandless
[EMAIL PROTECTED] wrote:


Since it seems reproducible, could you turn on debugging output
(IndexWriter.setInfoStream(...)), get the FileNotFoundException  
to happen

again, and post the resulting output?

Mike







Re: hello, a question about solr.

2008-08-20 Thread Norberto Meijome
On Wed, 20 Aug 2008 10:58:50 -0300
Alexander Ramos Jardim [EMAIL PROTECTED] wrote:

 A tiny but really explanation can be found here
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


thanks Alexander - indeed, quite short, and focused on shingles ... which , if 
I understand correctly, are groups of terms of n size... the ngramtokizer 
creates tokens of n-characters from your input.

Searching for ngram or n-gram in the archives should bring more relevant 
information up, which isnt in the wiki yet.

B

_
{Beto|Norberto|Numard} Meijome

All that is necessary for the triumph of evil is that good men do nothing.
  Edmund Burke

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: DIH - Document missing required field error

2008-08-20 Thread Shalin Shekhar Mangar
Does the mygallery_id column in your database allow nulls? It gets copied
to the id field in Solr which is required, hence the error.

On Thu, Aug 21, 2008 at 2:12 AM, Todd Breiholz [EMAIL PROTECTED] wrote:

 I am testing out DataImportHandler (Nightly 08-18-2008) in our environment
 with a simple schema. One database view with 10 columns mapped to an  index
 in Solr. When running the full-import command, I get the error below for
 each row. I'm sure I'm missing something, but I can't figure it out.

 WARNING: Error creating document : SolrInputDocumnt[{}]
 org.apache.solr.common.SolrException: Document [null] missing required
 field: id
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289)
at

 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69)
at

 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288)
at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317)
at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:176)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:134)
at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:332)
at

 org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:384)
at

 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190)
at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1188)
at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

 Here is my data-config.xml:

 dataConfig
dataSource type=JdbcDataSource
 driver=oracle.jdbc.driver.OracleDriver
url=jdbc:oracle:oci:@orcl user=... password=.../
document name=sharemy
entity name=gallery pk=mygallery_id
query=select * from mdp_share_my_search_view
field column=mygallery_id name=id/
field column=mygallrychanl_id name=channelId/
field column=mygallrycat_id name=categoryId/
field column=user_id name=userId/
field column=mygallry_name name=galleryName/
field column=photo_names name=photoNames/
field column=photo_descs name=photoDescriptions/
field column=hit_count name=hitCount/
field column=vote_count name=voteCount/
field column=rating name=rating/
field column=photo_tags name=photoTags/
/entity
/document
 /dataConfig

 and from my schema.xml:

  fields
   field name=id type=string indexed=true stored=true
 required=true /
   field name=channelId type=string indexed=true stored=true
 omitNorms=true/
   field name=categoryId type=string indexed=true stored=true
 omitNorms=true/
   field name=userId type=string indexed=true stored=true
 omitNorms=true/
   field name=galleryName type=text indexed=true stored=true/
   field name=photoNames type=text indexed=true stored=true/
   field name=photoDescriptions type=text indexed=true
 stored=true/
   field name=hitCount type=sint indexed=true stored=true/
   field name=voteCount type=sint indexed=true stored=true/
   field name=rating type=sfloat indexed=true stored=true/
   field name=photoTags type=text indexed=true stored=true/
   field name=allText type=text indexed=true stored=false
 multiValued=true/
 /fields
 uniqueKeyid/uniqueKey




-- 
Regards,
Shalin Shekhar Mangar.