Re: Solr Logo thought
Hi, Only few responded so far. How we can get more feedback? Do you think I should work on the proposal a little bit more and then attach it to SOLR-84? Regards, Lukas On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I like it, even its asymmetry. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lukáš Vlček [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Sunday, August 17, 2008 7:02:25 PM Subject: Re: Solr Logo thought Hi, My initial draft of Solr logo can be found here: http://picasaweb.google.com/lukas.vlcek/Solr The reason why I haven't attached it to SOLR-84 for now is that this is just draft and not final design (there are a lot of unfinished details). I would like to get some feedback before I spend more time on it. I had several ideas but in the end I found that the simplicity works best. Simple font, sun motive, just two colors. Should look fine in both the large and small formats. As for the favicon I would use the sun motive only - it means the O letter with the beams. The logo font still needs a lot of small (but important) touches. For now I would like to get feedback mostly about the basic idea. Regards, Lukas On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote: Plenty left, but here is a template to get things started: http://wiki.apache.org/solr/LogoContest Speaking of which, if we want to maintain the momentum of interest in this topic, someone (ie: not me) should setup a LogoContest wiki page with some of the goals discussed in the various threads on solr-user and solr-dev recently, as well as draft up some good guidelines for how we should run the contest -- http://blog.lukas-vlcek.com/ -- http://blog.lukas-vlcek.com/
Re: shards and performance
I have based my machines on bare bones servers (I call them ghetto servers). I essentially have motherboards in a rack sitting on catering trays (heat resistance is key). http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM slots - allows as much cheap RAM as possible) CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see if the different RAM approach works better and they are greener) Memory: 8GB (4 x 2GB DDR2 - best price per GB) HDD: SATA Disk (between 200 to 500GB - I had these from another project) I have HAProxy between the App servers and Solr so that I get failover if one of these goes down (expect failure). Having only 1M documents but more data per document will mean your situation is different. I am having particular performance issues with facets and trying to get my head around all the issues involved there. I see Mike has only 2 shards per box as he was squeezing performance. I didn't see any significant gain in performance but that is not to say there isn't one. Just for me, I had a level of performance in mind and stopped when that was met. It took almost a month of testing to get to that point so I was ready to move on to other problems - I might revisit it later. Also, my ghetto servers are getting similar reliability to the Dell Servers I have - but I have built the system with the expectations they will fail often although that has not happened yet. On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: As long as Solr/Lucene makes smart use from memory (and they from my experiences), it is really easy to calculate how long a huge query/update will take when you know how much the smaller ones will take. Just keep in mind that the resource consumption of memory and disk space is almost always proportional. 2008/8/19 Mike Klaas [EMAIL PROTECTED] On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: So you experience differs from Mike's. Obviously it's an important decision as to whether to buy more machines. Can you (or Mike) weigh in on what factors led to your different take on local shards vs. shards distributed across machines? I do both; the only reason I have two shards on each machine is to squeeze maximum performance out of an equipment budget. Err on the side of multiple machines. At least for building the index, the number of shards really does help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a single machine starts at about 100doc/s but slows down to 10doc/s when the index grows. It seems as though the limit is reached once you run out of RAM and it gets slower and slower in a linear fashion the larger the index you get. My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of data. Can you say what the specs were for these machines? Given that I have more like 1TB of data over 1M docs how do you think my machine requirements might be affected as compared to yours? You are in a much better position to determine this than we are. See how big an index you can put on a single machine while maintaining acceptible performance using a typical query load. It's relatively safe to extrapolate linearly from that. -Mike -- Alexander Ramos Jardim -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor
Re: IndexOutOfBoundsException
It looks like it was just RAM. I purchased a PHD PCI2 to test all my RAM from Ultra-X and some modules were just plain bad (some were bad right away and others needed to warm up before failing - I will testing all my RAM from now on). I have re-index this many times since then and never seen the problem since. So, it looks like it was just bad hardware - sorry about the confusion. On Mon, Aug 18, 2008 at 8:29 AM, Michael McCandless [EMAIL PROTECTED] wrote: OK gotchya. Please keep us posted one way or another... Mike Ian Connor wrote: Hi Mike, I am currently ruling out some bad memory modules. Knowing that this is a index corruption, makes memory corruption more likely. If replacing RAM does not fix the problem (which I need to do anyway due to segmentation faults), I will package up the crash into a reproducible scenario. On Mon, Aug 18, 2008 at 5:56 AM, Michael McCandless [EMAIL PROTECTED] wrote: Hi Ian, I sent this to java-user, but maybe you didn't see it, so let's try again on solr-user: It looks like your stored fields file (_X.fdt) is corrupt. Are you using multiple threads to add docs? Can you try switching to SerialMergeScheduler to verify it's reproducible? When you hit this exception, can you stop Solr and then run Lucene's CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the index is corrupt and see which segment it is? Then post back the exception and ls -l of your index directory? If you could post the client-side code you're using to build submit docs to Solr, and if I can get access to the Medline content, and I can the repro the bug, then I'll track it down... Mike On Aug 14, 2008, at 10:18 PM, Ian Connor wrote: I seem to be able to reproduce this very easily and the data is medline (so I am sure I can share it if needed with a quick email to check). - I am using fedora: %uname -a Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux %java -version java version 1.7.0 IcedTea Runtime Environment (build 1.7.0-b21) IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) - single core (will use shards but each machine just as one HDD so didn't see how cores would help but I am new at this) - next run I will keep the output to check for earlier errors - very and I can share code + data if that will help On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley [EMAIL PROTECTED] wrote: Yikes... not good. This shouldn't be due to anything you did wrong Ian... it looks like a lucene bug. Some questions: - what platform are you running on, and what JVM? - are you using multicore? (I fixed some index locking bugs recently) - are there any exceptions in the log before this? - how reproducible is this? -Yonik On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor [EMAIL PROTECTED] wrote: Hi, I have rebuilt my index a few times (it should get up to about 4 Million but around 1 Million it starts to fall apart). Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:323) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:300) Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at java.util.ArrayList.rangeCheck(ArrayList.java:572) at java.util.ArrayList.get(ArrayList.java:350) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:670) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:349) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3998) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:214) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:269) When this happens, the disk usage goes right up and the indexing really starts to slow down. I am using a Solr build from about a week ago - so my Lucene is at 2.4 according to the war files. Has anyone seen this error before? Is it possible to tell which Array is too large? Would it be an Array I am sending in or another internal one? Regards, Ian Connor -- Regards, Ian Connor -- Regards, Ian Connor -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor
Re: Solr Logo thought
It's pretty good, for me. My first thought is it is an eye (the orange reminds me of eyelashes), and then the second thought is it is the Sun. Take that w/ a grain of salt, though, there's a reason why I do server-side code and not user interfaces and graphic design. :-) -Grant On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote: Hi, Only few responded so far. How we can get more feedback? Do you think I should work on the proposal a little bit more and then attach it to SOLR-84? Regards, Lukas On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I like it, even its asymmetry. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lukáš Vlček [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Sunday, August 17, 2008 7:02:25 PM Subject: Re: Solr Logo thought Hi, My initial draft of Solr logo can be found here: http://picasaweb.google.com/lukas.vlcek/Solr The reason why I haven't attached it to SOLR-84 for now is that this is just draft and not final design (there are a lot of unfinished details). I would like to get some feedback before I spend more time on it. I had several ideas but in the end I found that the simplicity works best. Simple font, sun motive, just two colors. Should look fine in both the large and small formats. As for the favicon I would use the sun motive only - it means the O letter with the beams. The logo font still needs a lot of small (but important) touches. For now I would like to get feedback mostly about the basic idea. Regards, Lukas On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote: Plenty left, but here is a template to get things started: http://wiki.apache.org/solr/LogoContest Speaking of which, if we want to maintain the momentum of interest in this topic, someone (ie: not me) should setup a LogoContest wiki page with some of the goals discussed in the various threads on solr-user and solr-dev recently, as well as draft up some good guidelines for how we should run the contest
Re: Solr Logo thought
I went through the same thought process - it took a couple minutes for the whole thing to grow on me. Perhaps a tweak to the O if your looking for some constructive criticism? Again though, I really think its an awesome multipurpose logo. Works well in color, b/w, large, small, and just the sun part as a facicon/other. Grant Ingersoll wrote: It's pretty good, for me. My first thought is it is an eye (the orange reminds me of eyelashes), and then the second thought is it is the Sun. Take that w/ a grain of salt, though, there's a reason why I do server-side code and not user interfaces and graphic design. :-) -Grant On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote: Hi, Only few responded so far. How we can get more feedback? Do you think I should work on the proposal a little bit more and then attach it to SOLR-84? Regards, Lukas On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I like it, even its asymmetry. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lukáš Vlček [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Sunday, August 17, 2008 7:02:25 PM Subject: Re: Solr Logo thought Hi, My initial draft of Solr logo can be found here: http://picasaweb.google.com/lukas.vlcek/Solr The reason why I haven't attached it to SOLR-84 for now is that this is just draft and not final design (there are a lot of unfinished details). I would like to get some feedback before I spend more time on it. I had several ideas but in the end I found that the simplicity works best. Simple font, sun motive, just two colors. Should look fine in both the large and small formats. As for the favicon I would use the sun motive only - it means the O letter with the beams. The logo font still needs a lot of small (but important) touches. For now I would like to get feedback mostly about the basic idea. Regards, Lukas On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote: Plenty left, but here is a template to get things started: http://wiki.apache.org/solr/LogoContest Speaking of which, if we want to maintain the momentum of interest in this topic, someone (ie: not me) should setup a LogoContest wiki page with some of the goals discussed in the various threads on solr-user and solr-dev recently, as well as draft up some good guidelines for how we should run the contest
Re: IndexOutOfBoundsException
OK glad to hear that ;) Thanks for bringing closure, Ian! Mike Ian Connor wrote: It looks like it was just RAM. I purchased a PHD PCI2 to test all my RAM from Ultra-X and some modules were just plain bad (some were bad right away and others needed to warm up before failing - I will testing all my RAM from now on). I have re-index this many times since then and never seen the problem since. So, it looks like it was just bad hardware - sorry about the confusion. On Mon, Aug 18, 2008 at 8:29 AM, Michael McCandless [EMAIL PROTECTED] wrote: OK gotchya. Please keep us posted one way or another... Mike Ian Connor wrote: Hi Mike, I am currently ruling out some bad memory modules. Knowing that this is a index corruption, makes memory corruption more likely. If replacing RAM does not fix the problem (which I need to do anyway due to segmentation faults), I will package up the crash into a reproducible scenario. On Mon, Aug 18, 2008 at 5:56 AM, Michael McCandless [EMAIL PROTECTED] wrote: Hi Ian, I sent this to java-user, but maybe you didn't see it, so let's try again on solr-user: It looks like your stored fields file (_X.fdt) is corrupt. Are you using multiple threads to add docs? Can you try switching to SerialMergeScheduler to verify it's reproducible? When you hit this exception, can you stop Solr and then run Lucene's CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the index is corrupt and see which segment it is? Then post back the exception and ls -l of your index directory? If you could post the client-side code you're using to build submit docs to Solr, and if I can get access to the Medline content, and I can the repro the bug, then I'll track it down... Mike On Aug 14, 2008, at 10:18 PM, Ian Connor wrote: I seem to be able to reproduce this very easily and the data is medline (so I am sure I can share it if needed with a quick email to check). - I am using fedora: %uname -a Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux %java -version java version 1.7.0 IcedTea Runtime Environment (build 1.7.0-b21) IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) - single core (will use shards but each machine just as one HDD so didn't see how cores would help but I am new at this) - next run I will keep the output to check for earlier errors - very and I can share code + data if that will help On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley [EMAIL PROTECTED] wrote: Yikes... not good. This shouldn't be due to anything you did wrong Ian... it looks like a lucene bug. Some questions: - what platform are you running on, and what JVM? - are you using multicore? (I fixed some index locking bugs recently) - are there any exceptions in the log before this? - how reproducible is this? -Yonik On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor [EMAIL PROTECTED] wrote: Hi, I have rebuilt my index a few times (it should get up to about 4 Million but around 1 Million it starts to fall apart). Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at org .apache .lucene .index .ConcurrentMergeScheduler .handleMergeException(ConcurrentMergeScheduler.java:323) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:300) Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 at java.util.ArrayList.rangeCheck(ArrayList.java:572) at java.util.ArrayList.get(ArrayList.java:350) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java: 260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188) at org .apache.lucene.index.SegmentReader.document(SegmentReader.java: 670) at org .apache .lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:349) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java: 134) at org .apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 3998) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650) at org .apache .lucene .index .ConcurrentMergeScheduler .doMerge(ConcurrentMergeScheduler.java:214) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:269) When this happens, the disk usage goes right up and the indexing really starts to slow down. I am using a Solr build from about a week ago - so my Lucene is at 2.4 according to the war files. Has anyone seen this error before? Is it possible to tell which Array is too large? Would it be an Array I am sending in or another internal one? Regards, Ian Connor -- Regards, Ian Connor -- Regards, Ian Connor -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209
Re: Solr Logo thought
Hi, One lesson learned from Mahout logo is that we should also check whether this design is unique enough. For example one of the early design concept of Mahout logo was found to be very close to existing logo for some machine learning related project (though such thing is not very probable it happend). As for the sun motive there are tons of logos having sun in it so we should be careful... Anyway, I will try to work with the O and will kepp you posted once I have some results. In the meantime we can still collect constructive criticism here. Regards, Lukas On Wed, Aug 20, 2008 at 2:25 PM, Mark Miller [EMAIL PROTECTED] wrote: I went through the same thought process - it took a couple minutes for the whole thing to grow on me. Perhaps a tweak to the O if your looking for some constructive criticism? Again though, I really think its an awesome multipurpose logo. Works well in color, b/w, large, small, and just the sun part as a facicon/other. Grant Ingersoll wrote: It's pretty good, for me. My first thought is it is an eye (the orange reminds me of eyelashes), and then the second thought is it is the Sun. Take that w/ a grain of salt, though, there's a reason why I do server-side code and not user interfaces and graphic design. :-) -Grant On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote: Hi, Only few responded so far. How we can get more feedback? Do you think I should work on the proposal a little bit more and then attach it to SOLR-84? Regards, Lukas On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I like it, even its asymmetry. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lukáš Vlček [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Sunday, August 17, 2008 7:02:25 PM Subject: Re: Solr Logo thought Hi, My initial draft of Solr logo can be found here: http://picasaweb.google.com/lukas.vlcek/Solr The reason why I haven't attached it to SOLR-84 for now is that this is just draft and not final design (there are a lot of unfinished details). I would like to get some feedback before I spend more time on it. I had several ideas but in the end I found that the simplicity works best. Simple font, sun motive, just two colors. Should look fine in both the large and small formats. As for the favicon I would use the sun motive only - it means the O letter with the beams. The logo font still needs a lot of small (but important) touches. For now I would like to get feedback mostly about the basic idea. Regards, Lukas On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote: Plenty left, but here is a template to get things started: http://wiki.apache.org/solr/LogoContest Speaking of which, if we want to maintain the momentum of interest in this topic, someone (ie: not me) should setup a LogoContest wiki page with some of the goals discussed in the various threads on solr-user and solr-dev recently, as well as draft up some good guidelines for how we should run the contest -- http://blog.lukas-vlcek.com/
Re: solr-ruby version management
Otis, Thanks for the comment. I used dot instead of dash because my associate (Rubyist) said that if I use 1.3-0.0.7 style version, I'll get an error when making a gem (I've never tried this): ERROR: While executing gem (ArgumentError) /usr/local/lib/ruby/site_ruby/1.8/rubygems/version.rb:56:in `initialize': Malformed version number string 1-3-0.0.7 But on second thought, I want to stick with the current style version. That is, solr-ruby-0.0.6.gem. :) If we release solr-ruby-1.3.0.1.gem and after that, Solr 1.4 will be released. At thiat time, solr-ruby can never be renewed because solr-ruby-1.3 should work fine with Solr 1.4. But once if we release solr-ruby-1.3.0.x.gem, we should release solr-ruby-1.4.0.1.gem for the checkpoint purpose. I don't want to do that. Koji Otis Gospodnetic wrote: I like this idea. Perhaps separate the solr version and the solr-ruby version with a dash instead of dot -- solr-ruby-1.3.0-0.0.6 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Koji Sekiguchi [EMAIL PROTECTED] To: solr-user@lucene.apache.org; [EMAIL PROTECTED] Sent: Tuesday, August 19, 2008 4:24:31 AM Subject: solr-ruby version management From: http://www.nabble.com/CHANGES.txt-td18901774.html The latest version of solr-ruby is 0.0.6: solr-ruby-0.0.6.gem http://rubyforge.org/frs/?group_id=2875release_id=23885 I think it isn't clear what Solr version is corresponding. I'd like to change this to solr-ruby-{solrVersion}.{solr-rubyVersion}.gem when Solr 1.3 is released. Where solr-rubyVersion is two digits. That is, the first official release of solr-ruby will be solr-ruby-1.3.0.01.gem. Any objections to changing to this new version format? Or anyone who has suggestions, please let me know. Koji
Re: shards and performance
Another thing to consider on your sharding is the access rate you want to guarantee. In the project I am working, I need to guarantee at least 200hits/second with various facets in all queries. I am not using sharding, but I have 6 Solr instances per cluster node, and I have 3 nodes, to a total of 18 solr instances. Each node has only one index, so I keep the 6 instance pointing to the same the index in a given node. What made a huge diference in my performance was the removal of the lock. I expect that helps you out. 2008/8/20 Ian Connor [EMAIL PROTECTED] I have based my machines on bare bones servers (I call them ghetto servers). I essentially have motherboards in a rack sitting on catering trays (heat resistance is key). http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM slots - allows as much cheap RAM as possible) CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see if the different RAM approach works better and they are greener) Memory: 8GB (4 x 2GB DDR2 - best price per GB) HDD: SATA Disk (between 200 to 500GB - I had these from another project) I have HAProxy between the App servers and Solr so that I get failover if one of these goes down (expect failure). Having only 1M documents but more data per document will mean your situation is different. I am having particular performance issues with facets and trying to get my head around all the issues involved there. I see Mike has only 2 shards per box as he was squeezing performance. I didn't see any significant gain in performance but that is not to say there isn't one. Just for me, I had a level of performance in mind and stopped when that was met. It took almost a month of testing to get to that point so I was ready to move on to other problems - I might revisit it later. Also, my ghetto servers are getting similar reliability to the Dell Servers I have - but I have built the system with the expectations they will fail often although that has not happened yet. On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: As long as Solr/Lucene makes smart use from memory (and they from my experiences), it is really easy to calculate how long a huge query/update will take when you know how much the smaller ones will take. Just keep in mind that the resource consumption of memory and disk space is almost always proportional. 2008/8/19 Mike Klaas [EMAIL PROTECTED] On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: So you experience differs from Mike's. Obviously it's an important decision as to whether to buy more machines. Can you (or Mike) weigh in on what factors led to your different take on local shards vs. shards distributed across machines? I do both; the only reason I have two shards on each machine is to squeeze maximum performance out of an equipment budget. Err on the side of multiple machines. At least for building the index, the number of shards really does help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a single machine starts at about 100doc/s but slows down to 10doc/s when the index grows. It seems as though the limit is reached once you run out of RAM and it gets slower and slower in a linear fashion the larger the index you get. My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of data. Can you say what the specs were for these machines? Given that I have more like 1TB of data over 1M docs how do you think my machine requirements might be affected as compared to yours? You are in a much better position to determine this than we are. See how big an index you can put on a single machine while maintaining acceptible performance using a typical query load. It's relatively safe to extrapolate linearly from that. -Mike -- Alexander Ramos Jardim -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor -- Alexander Ramos Jardim
Re: shards and performance
So, because the OS is doing the caching in RAM. It means I could have 6 jetty servers per machine all pointing to the same data. Once the index is built, I can load up some more servers on different ports and it will boost performance. That does sound promising - thanks for the tip. What made you pick 6? On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Another thing to consider on your sharding is the access rate you want to guarantee. In the project I am working, I need to guarantee at least 200hits/second with various facets in all queries. I am not using sharding, but I have 6 Solr instances per cluster node, and I have 3 nodes, to a total of 18 solr instances. Each node has only one index, so I keep the 6 instance pointing to the same the index in a given node. What made a huge diference in my performance was the removal of the lock. I expect that helps you out. 2008/8/20 Ian Connor [EMAIL PROTECTED] I have based my machines on bare bones servers (I call them ghetto servers). I essentially have motherboards in a rack sitting on catering trays (heat resistance is key). http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM slots - allows as much cheap RAM as possible) CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see if the different RAM approach works better and they are greener) Memory: 8GB (4 x 2GB DDR2 - best price per GB) HDD: SATA Disk (between 200 to 500GB - I had these from another project) I have HAProxy between the App servers and Solr so that I get failover if one of these goes down (expect failure). Having only 1M documents but more data per document will mean your situation is different. I am having particular performance issues with facets and trying to get my head around all the issues involved there. I see Mike has only 2 shards per box as he was squeezing performance. I didn't see any significant gain in performance but that is not to say there isn't one. Just for me, I had a level of performance in mind and stopped when that was met. It took almost a month of testing to get to that point so I was ready to move on to other problems - I might revisit it later. Also, my ghetto servers are getting similar reliability to the Dell Servers I have - but I have built the system with the expectations they will fail often although that has not happened yet. On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: As long as Solr/Lucene makes smart use from memory (and they from my experiences), it is really easy to calculate how long a huge query/update will take when you know how much the smaller ones will take. Just keep in mind that the resource consumption of memory and disk space is almost always proportional. 2008/8/19 Mike Klaas [EMAIL PROTECTED] On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: So you experience differs from Mike's. Obviously it's an important decision as to whether to buy more machines. Can you (or Mike) weigh in on what factors led to your different take on local shards vs. shards distributed across machines? I do both; the only reason I have two shards on each machine is to squeeze maximum performance out of an equipment budget. Err on the side of multiple machines. At least for building the index, the number of shards really does help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a single machine starts at about 100doc/s but slows down to 10doc/s when the index grows. It seems as though the limit is reached once you run out of RAM and it gets slower and slower in a linear fashion the larger the index you get. My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of data. Can you say what the specs were for these machines? Given that I have more like 1TB of data over 1M docs how do you think my machine requirements might be affected as compared to yours? You are in a much better position to determine this than we are. See how big an index you can put on a single machine while maintaining acceptible performance using a typical query load. It's relatively safe to extrapolate linearly from that. -Mike -- Alexander Ramos Jardim -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor -- Alexander Ramos Jardim -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor
Re: hello, a question about solr.
A tiny but really explanation can be found here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters 2008/8/18 finy finy [EMAIL PROTECTED] thanks for your help. could you give me your gmail talk address or msn? 2008/8/19, Norberto Meijome [EMAIL PROTECTED]: On Mon, 18 Aug 2008 23:07:19 +0800 finy finy [EMAIL PROTECTED] wrote: because i use chinese character, for example ibm___ solr will parse it into a term ibm and a phraze _ __ can i use solr to query with a term ibm and a term _ and a term __? Hi finy, you should look into n-gram tokenizers. Not sure if it is documented in the wiki, but it has been discussed in the mailing list quite a few times. in short, an n-gram tokenizer breaks your input into blocks of characters of size n , which are then used to compare in the index. I think for Chinese , bi-gram is the favoured approach. good luck, B _ {Beto|Norberto|Numard} Meijome I used to hate weddings; all the Grandmas would poke me and say, You're next sonny! They stopped doing that when i started to do it to them at funerals. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. -- Alexander Ramos Jardim
Re: localsolr and dataimport problems
Hi Shalin, I've compiled a new apache-solr-dataimporthandler-1.3-dev.jar from the trunk without issues and dropped this into my existing WAR, but I'm now receiving java.lang.NoSuchMethodError: org.apache.solr.request.SolrQueryResponse.setHttpCaching(Z)V when I try and access http://localhost:8983/solr/dataimport Many thanks, Tom Shalin Shekhar Mangar wrote: Hi Tom, This should be fixed in the trunk code. On Thu, Aug 7, 2008 at 12:13 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: My mistake, it is SOLR-676. https://issues.apache.org/jira/browse/SOLR-676 On Thu, Aug 7, 2008 at 12:13 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: I've opened SOLR-647 to fix this. https://issues.apache.org/jira/browse/SOLR-676 On Wed, Aug 6, 2008 at 9:56 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: On Wed, Aug 6, 2008 at 5:50 PM, TomWilliamson [EMAIL PROTECTED] wrote: I'm trying to use localsolr (1.5) with the dataimport handler but not having much luck. The issue seems to be that the _localtier dynamic fields dont get generated when adding docs via the dataimport, although they do if I add docs via post.jar (xml document). Am i missing something simple here? -- View this message in context: http://www.nabble.com/localsolr-and-dataimport-problems-tp18849983p18849983.html Sent from the Solr - User mailing list archive at Nabble.com. Hmm... LocalSolr uses the UpdateRequestProcessor API in Solr to add it's dynamic fields. However, DataImportHandler bypasses that API and adds documents directly to the UpdateHandler. Sounds like an improvement (bug?) in DataImportHandler which should be addressed before release. I'll open an issue and work on it. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/localsolr-and-dataimport-problems-tp18849983p19069927.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shards and performance
2008/8/20 Ian Connor [EMAIL PROTECTED] So, because the OS is doing the caching in RAM. It means I could have 6 jetty servers per machine all pointing to the same data. Once the index is built, I can load up some more servers on different ports and it will boost performance. That does sound promising - thanks for the tip. What made you pick 6? Each weblogic instance sits on top of a 2GB heap size JVM. Each cluster node has 16GB RAM. On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: Another thing to consider on your sharding is the access rate you want to guarantee. In the project I am working, I need to guarantee at least 200hits/second with various facets in all queries. I am not using sharding, but I have 6 Solr instances per cluster node, and I have 3 nodes, to a total of 18 solr instances. Each node has only one index, so I keep the 6 instance pointing to the same the index in a given node. What made a huge diference in my performance was the removal of the lock. I expect that helps you out. 2008/8/20 Ian Connor [EMAIL PROTECTED] I have based my machines on bare bones servers (I call them ghetto servers). I essentially have motherboards in a rack sitting on catering trays (heat resistance is key). http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM slots - allows as much cheap RAM as possible) CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see if the different RAM approach works better and they are greener) Memory: 8GB (4 x 2GB DDR2 - best price per GB) HDD: SATA Disk (between 200 to 500GB - I had these from another project) I have HAProxy between the App servers and Solr so that I get failover if one of these goes down (expect failure). Having only 1M documents but more data per document will mean your situation is different. I am having particular performance issues with facets and trying to get my head around all the issues involved there. I see Mike has only 2 shards per box as he was squeezing performance. I didn't see any significant gain in performance but that is not to say there isn't one. Just for me, I had a level of performance in mind and stopped when that was met. It took almost a month of testing to get to that point so I was ready to move on to other problems - I might revisit it later. Also, my ghetto servers are getting similar reliability to the Dell Servers I have - but I have built the system with the expectations they will fail often although that has not happened yet. On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim [EMAIL PROTECTED] wrote: As long as Solr/Lucene makes smart use from memory (and they from my experiences), it is really easy to calculate how long a huge query/update will take when you know how much the smaller ones will take. Just keep in mind that the resource consumption of memory and disk space is almost always proportional. 2008/8/19 Mike Klaas [EMAIL PROTECTED] On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: So you experience differs from Mike's. Obviously it's an important decision as to whether to buy more machines. Can you (or Mike) weigh in on what factors led to your different take on local shards vs. shards distributed across machines? I do both; the only reason I have two shards on each machine is to squeeze maximum performance out of an equipment budget. Err on the side of multiple machines. At least for building the index, the number of shards really does help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a single machine starts at about 100doc/s but slows down to 10doc/s when the index grows. It seems as though the limit is reached once you run out of RAM and it gets slower and slower in a linear fashion the larger the index you get. My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of data. Can you say what the specs were for these machines? Given that I have more like 1TB of data over 1M docs how do you think my machine requirements might be affected as compared to yours? You are in a much better position to determine this than we are. See how big an index you can put on a single machine while maintaining acceptible performance using a typical query load. It's relatively safe to extrapolate linearly from that. -Mike -- Alexander Ramos Jardim -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor -- Alexander Ramos Jardim -- Regards, Ian Connor 1 Leighton St #605
Re: localsolr and dataimport problems
Hi Tom, Solr internals have seen some major overhauls recently and this method was added a month back. I suggest that you try with the trunk solr war file. You may be using an older nightly build. I'm wondering if LocalSolr has kept up with the changes in the UpdateProcessor API. On Wed, Aug 20, 2008 at 7:50 PM, TomWilliamson [EMAIL PROTECTED] wrote: Hi Shalin, I've compiled a new apache-solr-dataimporthandler-1.3-dev.jar from the trunk without issues and dropped this into my existing WAR, but I'm now receiving java.lang.NoSuchMethodError: org.apache.solr.request.SolrQueryResponse.setHttpCaching(Z)V when I try and access http://localhost:8983/solr/dataimport Many thanks, Tom -- View this message in context: http://www.nabble.com/localsolr-and-dataimport-problems-tp18849983p19069927.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Testing query response time
I would like to test query response time for a set of queries. I'm not interested in capacity Q/sec, just response time. My queries will be against an index of OCR'd books so in the real world every query is probably unique and impossible to predict so I don't see a way to prewarm any of the caches. I'm not sorting. I'm not faceting. I'm querying on a few fields like title, author, subject and date in a range. Regarding initial conditions, it seems that there's no useful state into which I can put the caches. Would the best approach be to run the queries from a cold solr startup? What about OS disk caches? I can see two arguments. One, just to test solr the disk caches should be empty. On the other hand, realistically, the disk caches would be full so that argues for executing enough queries to load those and then redo the query set (with empty solr caches). Speaking of empty solr caches, is there a way to flush those while solr is running? What other system states do I need to control for to get a handle on response time? Thanks and regards, Phil -- Phillip Farber - http://www.umdl.umich.edu
Re: Solr Logo thought
Nice job Lukas; the professionalism and quality of work is evident. I like aspects of the logo, but too am having trouble getting past the eye-looking O. Is it intentional (eye:look:search, etc)? -Mike On 20-Aug-08, at 5:25 AM, Mark Miller wrote: I went through the same thought process - it took a couple minutes for the whole thing to grow on me. Perhaps a tweak to the O if your looking for some constructive criticism? Again though, I really think its an awesome multipurpose logo. Works well in color, b/w, large, small, and just the sun part as a facicon/ other. Grant Ingersoll wrote: It's pretty good, for me. My first thought is it is an eye (the orange reminds me of eyelashes), and then the second thought is it is the Sun. Take that w/ a grain of salt, though, there's a reason why I do server-side code and not user interfaces and graphic design. :-) -Grant On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote: Hi, Only few responded so far. How we can get more feedback? Do you think I should work on the proposal a little bit more and then attach it to SOLR-84? Regards, Lukas On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I like it, even its asymmetry. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lukáš Vlček [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Sunday, August 17, 2008 7:02:25 PM Subject: Re: Solr Logo thought Hi, My initial draft of Solr logo can be found here: http://picasaweb.google.com/lukas.vlcek/Solr The reason why I haven't attached it to SOLR-84 for now is that this is just draft and not final design (there are a lot of unfinished details). I would like to get some feedback before I spend more time on it. I had several ideas but in the end I found that the simplicity works best. Simple font, sun motive, just two colors. Should look fine in both the large and small formats. As for the favicon I would use the sun motive only - it means the O letter with the beams. The logo font still needs a lot of small (but important) touches. For now I would like to get feedback mostly about the basic idea. Regards, Lukas On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote: Plenty left, but here is a template to get things started: http://wiki.apache.org/solr/LogoContest Speaking of which, if we want to maintain the momentum of interest in this topic, someone (ie: not me) should setup a LogoContest wiki page with some of the goals discussed in the various threads on solr-user and solr-dev recently, as well as draft up some good guidelines for how we should run the contest
Re: Solr Logo thought
Hi, Well, the eye looking O is not intentional. It is more a result of the techique I used when doing the initial skatch. Believe it or not this design started at magnetic drawing board (http://www.reggies.co.za/nov/nov274.jpg) which I use now when playing with my 2 year old daughter. It is an excellent piece of hardware and though it lacks in terms of output resolution and is not presure sensitive and its undo capatilities are very limitted it outperforms my A4+ Wacom tablet in terms of bootup time and is absolutely *green energy* equipment. But as I mentioned, its resolution is quite low and thus the vectorized version is not perfect yet. Anyway, I think that eye looking O is an interesting observation I will work on this because I see that it can be confuzing. Regards, Lukas On Wed, Aug 20, 2008 at 9:01 PM, Mike Klaas [EMAIL PROTECTED] wrote: Nice job Lukas; the professionalism and quality of work is evident. I like aspects of the logo, but too am having trouble getting past the eye-looking O. Is it intentional (eye:look:search, etc)? -Mike On 20-Aug-08, at 5:25 AM, Mark Miller wrote: I went through the same thought process - it took a couple minutes for the whole thing to grow on me. Perhaps a tweak to the O if your looking for some constructive criticism? Again though, I really think its an awesome multipurpose logo. Works well in color, b/w, large, small, and just the sun part as a facicon/other. Grant Ingersoll wrote: It's pretty good, for me. My first thought is it is an eye (the orange reminds me of eyelashes), and then the second thought is it is the Sun. Take that w/ a grain of salt, though, there's a reason why I do server-side code and not user interfaces and graphic design. :-) -Grant On Aug 20, 2008, at 3:48 AM, Lukáš Vlček wrote: Hi, Only few responded so far. How we can get more feedback? Do you think I should work on the proposal a little bit more and then attach it to SOLR-84? Regards, Lukas On Mon, Aug 18, 2008 at 6:14 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: I like it, even its asymmetry. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Lukáš Vlček [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Sunday, August 17, 2008 7:02:25 PM Subject: Re: Solr Logo thought Hi, My initial draft of Solr logo can be found here: http://picasaweb.google.com/lukas.vlcek/Solr The reason why I haven't attached it to SOLR-84 for now is that this is just draft and not final design (there are a lot of unfinished details). I would like to get some feedback before I spend more time on it. I had several ideas but in the end I found that the simplicity works best. Simple font, sun motive, just two colors. Should look fine in both the large and small formats. As for the favicon I would use the sun motive only - it means the O letter with the beams. The logo font still needs a lot of small (but important) touches. For now I would like to get feedback mostly about the basic idea. Regards, Lukas On Sat, Aug 9, 2008 at 8:21 PM, Mark Miller wrote: Plenty left, but here is a template to get things started: http://wiki.apache.org/solr/LogoContest Speaking of which, if we want to maintain the momentum of interest in this topic, someone (ie: not me) should setup a LogoContest wiki page with some of the goals discussed in the various threads on solr-user and solr-dev recently, as well as draft up some good guidelines for how we should run the contest -- http://blog.lukas-vlcek.com/
DIH - Document missing required field error
I am testing out DataImportHandler (Nightly 08-18-2008) in our environment with a simple schema. One database view with 10 columns mapped to an index in Solr. When running the full-import command, I get the error below for each row. I'm sure I'm missing something, but I can't figure it out. WARNING: Error creating document : SolrInputDocumnt[{}] org.apache.solr.common.SolrException: Document [null] missing required field: id at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:176) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:134) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:332) at org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:384) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1188) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Here is my data-config.xml: dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:oci:@orcl user=... password=.../ document name=sharemy entity name=gallery pk=mygallery_id query=select * from mdp_share_my_search_view field column=mygallery_id name=id/ field column=mygallrychanl_id name=channelId/ field column=mygallrycat_id name=categoryId/ field column=user_id name=userId/ field column=mygallry_name name=galleryName/ field column=photo_names name=photoNames/ field column=photo_descs name=photoDescriptions/ field column=hit_count name=hitCount/ field column=vote_count name=voteCount/ field column=rating name=rating/ field column=photo_tags name=photoTags/ /entity /document /dataConfig and from my schema.xml: fields field name=id type=string indexed=true stored=true required=true / field name=channelId type=string indexed=true stored=true omitNorms=true/ field name=categoryId type=string indexed=true stored=true omitNorms=true/ field name=userId type=string indexed=true stored=true omitNorms=true/ field name=galleryName type=text indexed=true stored=true/ field name=photoNames type=text indexed=true stored=true/ field name=photoDescriptions type=text indexed=true stored=true/ field name=hitCount type=sint indexed=true stored=true/ field name=voteCount type=sint indexed=true stored=true/ field name=rating type=sfloat indexed=true stored=true/ field name=photoTags type=text indexed=true stored=true/ field name=allText type=text indexed=true stored=false multiValued=true/ /fields uniqueKeyid/uniqueKey
Recognizing date inputs
Hi, (Im sure this was asked before but found nothing on markmail) ... Wondering if Solr can handle this on its own or if something needs to be written ... would like to handle recognizing date inputs to a search box for news articles, items such as August 1,August 1st or 08/01/2008 ... its a bit different than synonym like handling in that I have to transform the query to a specific field. Any thoughts? Thanks. - Jon
How to boost the score higher in case user query matches entire field value than just some words within a field
Hi I have a text field named prodname in the solr index. Lets say there are 3 document in the index and here are the field values for prodname field: Doc1: cordless drill Doc2: cordless drill battery Doc3: cordless drill charger Searching for prodname:cordless drill will hit all three documents. So how can I make Doc1 score higher than the other two? BTW, I am using solr1.2. thanks! -Simon -- View this message in context: http://www.nabble.com/How-to-boost-the-score-higher-in-case-user-query-matches-entire-field-value-than-just-some-words-within-a-field-tp19079221p19079221.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto commit error and java.io.FileNotFoundException
Ok, I did what you suggested, giving each SolrIndexWriter its own infoStream log file, created in the init() method. The thing is, I now have like 3400 infostream log files, I guess reflecting how solr created like 3400 SolrIndexWriters over the course of the run. (Hopefully this is plausible.) Could you explain what I should be looking for in these files? (Posting the whole bunch of it doesn't sound very useful.) Thanks, Chris On Mon, Aug 18, 2008 at 10:12 AM, Michael McCandless [EMAIL PROTECTED] wrote: Alas, I think this won't actually turn on IndexWriter's infoStream. I think you may need to modify the SolrIndexWriter.java sources, in the init method, to add a call to setInfoStream(...). Can any Solr developers confirm this? Mike Chris Harris wrote: I'm assuming that one way to do this would be to set the logging level to FINEST in the logging page in the solr admin tool, and then to make sure my logging.properties file is also set to record the FINEST logging level. Let me know if that won't enable to sort of debugging info you are talking about. (I do understand that the logging page in the admin tool makes temporary changes that will get reverted when you restart Solr.) On Mon, Aug 18, 2008 at 3:05 AM, Michael McCandless [EMAIL PROTECTED] wrote: Since it seems reproducible, could you turn on debugging output (IndexWriter.setInfoStream(...)), get the FileNotFoundException to happen again, and post the resulting output? Mike
Re: Auto commit error and java.io.FileNotFoundException
Did the same FileNotFoundException / massive deletion of files occur? Actually if you could zip them all up and post them, I'll dig through them to see if they give any clues... Mike Chris Harris wrote: Ok, I did what you suggested, giving each SolrIndexWriter its own infoStream log file, created in the init() method. The thing is, I now have like 3400 infostream log files, I guess reflecting how solr created like 3400 SolrIndexWriters over the course of the run. (Hopefully this is plausible.) Could you explain what I should be looking for in these files? (Posting the whole bunch of it doesn't sound very useful.) Thanks, Chris On Mon, Aug 18, 2008 at 10:12 AM, Michael McCandless [EMAIL PROTECTED] wrote: Alas, I think this won't actually turn on IndexWriter's infoStream. I think you may need to modify the SolrIndexWriter.java sources, in the init method, to add a call to setInfoStream(...). Can any Solr developers confirm this? Mike Chris Harris wrote: I'm assuming that one way to do this would be to set the logging level to FINEST in the logging page in the solr admin tool, and then to make sure my logging.properties file is also set to record the FINEST logging level. Let me know if that won't enable to sort of debugging info you are talking about. (I do understand that the logging page in the admin tool makes temporary changes that will get reverted when you restart Solr.) On Mon, Aug 18, 2008 at 3:05 AM, Michael McCandless [EMAIL PROTECTED] wrote: Since it seems reproducible, could you turn on debugging output (IndexWriter.setInfoStream(...)), get the FileNotFoundException to happen again, and post the resulting output? Mike
Re: hello, a question about solr.
On Wed, 20 Aug 2008 10:58:50 -0300 Alexander Ramos Jardim [EMAIL PROTECTED] wrote: A tiny but really explanation can be found here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters thanks Alexander - indeed, quite short, and focused on shingles ... which , if I understand correctly, are groups of terms of n size... the ngramtokizer creates tokens of n-characters from your input. Searching for ngram or n-gram in the archives should bring more relevant information up, which isnt in the wiki yet. B _ {Beto|Norberto|Numard} Meijome All that is necessary for the triumph of evil is that good men do nothing. Edmund Burke I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: DIH - Document missing required field error
Does the mygallery_id column in your database allow nulls? It gets copied to the id field in Solr which is required, hence the error. On Thu, Aug 21, 2008 at 2:12 AM, Todd Breiholz [EMAIL PROTECTED] wrote: I am testing out DataImportHandler (Nightly 08-18-2008) in our environment with a simple schema. One database view with 10 columns mapped to an index in Solr. When running the full-import command, I get the error below for each row. I'm sure I'm missing something, but I can't figure it out. WARNING: Error creating document : SolrInputDocumnt[{}] org.apache.solr.common.SolrException: Document [null] missing required field: id at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:69) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:288) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:317) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:176) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:134) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:332) at org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:384) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:190) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1188) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Here is my data-config.xml: dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:oci:@orcl user=... password=.../ document name=sharemy entity name=gallery pk=mygallery_id query=select * from mdp_share_my_search_view field column=mygallery_id name=id/ field column=mygallrychanl_id name=channelId/ field column=mygallrycat_id name=categoryId/ field column=user_id name=userId/ field column=mygallry_name name=galleryName/ field column=photo_names name=photoNames/ field column=photo_descs name=photoDescriptions/ field column=hit_count name=hitCount/ field column=vote_count name=voteCount/ field column=rating name=rating/ field column=photo_tags name=photoTags/ /entity /document /dataConfig and from my schema.xml: fields field name=id type=string indexed=true stored=true required=true / field name=channelId type=string indexed=true stored=true omitNorms=true/ field name=categoryId type=string indexed=true stored=true omitNorms=true/ field name=userId type=string indexed=true stored=true omitNorms=true/ field name=galleryName type=text indexed=true stored=true/ field name=photoNames type=text indexed=true stored=true/ field name=photoDescriptions type=text indexed=true stored=true/ field name=hitCount type=sint indexed=true stored=true/ field name=voteCount type=sint indexed=true stored=true/ field name=rating type=sfloat indexed=true stored=true/ field name=photoTags type=text indexed=true stored=true/ field name=allText type=text indexed=true stored=false multiValued=true/ /fields uniqueKeyid/uniqueKey -- Regards, Shalin Shekhar Mangar.