Re: 99.9% uptime requirement

2009-08-06 Thread Shalin Shekhar Mangar
On Thu, Aug 6, 2009 at 4:10 AM, Robert Petersen rober...@buy.com wrote: Maintenance Questions: In a two slave one master setup where the two slaves are behind load balancers what happens if I have to restart solr? If I have to restart solr say for a schema update where I have added a new

Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Ninad Raut
Hi, I have a search engine on Solr. Also I have a remote web application which will be using the Solr Indexes for search. I have three scenarios: 1) Transfer the Indexes to the Remote Application. - This will reduce load on the actual solr server and make seraches faster. - Need to write

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Aug 6, 2009 at 12:24 PM, Ninad Rauthbase.user.ni...@gmail.com wrote: Hi, I have a search engine on Solr. Also I have a remote web application which will be using the Solr Indexes for search. I have three scenarios: 1) Transfer the Indexes to the Remote Application.   - This will

How to set solr/home in linux OS?

2009-08-06 Thread huenzhao
Hi all, I know how to configure solr.home by using tomcat6, but I don't know how to set solr.home by using Glassfish(V2.1). I have tried to set the solr.home in .profile as fellows: export solr.home=/home/huenzhao/search/solr export solr/home=/home/huenzhao/search/solr export

wildcard search is not working

2009-08-06 Thread Radha C.
Hi, I have documents contain word healthcare articles. I need to match the healthcare artcles documents for the query strings helath, articles... I tried q=health*, q=helath*, q=heath*articles but everything returns empty result. When I try q=healthcare artilces ,the search returns proper

Re: wildcard search is not working

2009-08-06 Thread Avlesh Singh
Go through this thread first - http://markmail.org/message/bannl2fpblt5sqlw If it still does not help, post back your field type definition in schema.xml Cheers Avlesh On Thu, Aug 6, 2009 at 3:46 PM, Radha C. cra...@ceiindia.com wrote: Hi, I have documents contain word healthcare articles.

Re: How to set solr/home in linux OS?

2009-08-06 Thread Chantal Ackermann
You have to quote values that include whitespace: export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/home/huenzhao/search/solr or to make it accessible for other paths as well: export SOLR_HOME=/home/huenzhao/search/solr export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME Cheers, Chantal

Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Ninad Raut
Hi, Is there a way to import existing Lucene Indexes to SOLR?? I have a huge lucene index which I want to import into SOLR server. Regards, Ninad Raut.

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
just copy the whole index into data_dir/index and start Solr. That should just fine On Thu, Aug 6, 2009 at 5:17 PM, Ninad Rauthbase.user.ni...@gmail.com wrote: Hi, Is there a way to import existing Lucene Indexes to SOLR?? I have a huge lucene index which I want to import into SOLR server.

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Mark Miller
Your kidding right :) Noble Paul നോബിള്‍ नोब्ळ् wrote: just copy the whole index into data_dir/index and start Solr. That should just fine On Thu, Aug 6, 2009 at 5:17 PM, Ninad Rauthbase.user.ni...@gmail.com wrote: Hi, Is there a way to import existing Lucene Indexes to SOLR?? I have a

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Ninad Raut
what about the schema and querying?? there should be some changes to the solr schema I think. Correct me if I am wrong. 2009/8/6 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com just copy the whole index into data_dir/index and start Solr. That should just fine On Thu, Aug 6, 2009 at 5:17

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Avlesh Singh
I am also interested in knowing! Does it work? Cheers Avlesh On Thu, Aug 6, 2009 at 5:23 PM, Mark Miller markrmil...@gmail.com wrote: Your kidding right :) Noble Paul നോബിള്‍ नोब्ळ् wrote: just copy the whole index into data_dir/index and start Solr. That should just fine On Thu, Aug 6,

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Avlesh Singh
what about the schema and querying?? there should be some changes to the solr schema I think. Correct me if I am wrong. Of course! You have to create your own schema inside the schema.xml and adjust values inside solrconfig.xml at the bare minimum to get started. Cheers Avlesh On Thu, Aug 6,

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
yeah the big part was missed . You need to setup a schema.xml matching the field names and types and you would need a solrconfig.xml . But getting the schema right would be the challenge. On Thu, Aug 6, 2009 at 5:23 PM, Mark Millermarkrmil...@gmail.com wrote: Your kidding right :) Noble Paul

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Ninad Raut
But getting the schema right would be the challenge. if I know my fields and there are not many in the lucene index I should not face any problen creating a schema or are there any pitfalls which I should be aware off. Thanks for such quick replies guys. 2009/8/6 Noble Paul നോബിള്‍ नोब्ळ्

Re: Importing Existing Non-SOLR Lucene Indexes into Solr

2009-08-06 Thread Avlesh Singh
if I know my fields and there are not many in the lucene index I should not face any problen creating a schema or are there any pitfalls which I should be aware off. Nothing specific. The creation of schema should be very straightforward. Just make sure you use the right field types. Cheers

Re: Limit of Index size per machine..

2009-08-06 Thread Ian Connor
Hi, Solr is fine out of RAM if you don't change it (build and then let it cache what it needs). The RAM is needed when you constantly pepper it with updates and commits. If you can have the logs update certain shards and then merge those indexes periodically to machines you can leave alone - this

Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)

2009-08-06 Thread Chantal Ackermann
Hi again, 1.4 runs fine for me, now, but I'm still struggling for the correct delete query. There is few to no documentation at all for the new special commands, and I have problems guessing the correct setup from reading through the code. SORL-1060 is not enough help. I've come up with a

Issue with special charcters in solar search

2009-08-06 Thread Deepak VSVK
Hi , In my application I am trying to search with some special characters like , $ # , sloar returning all the search results available . Some of the charcters like _ . are not encoding in the search url .can any one have any idea , what would be the root cause of this . I am using jetty

Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Aug 6, 2009 at 6:41 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: Hi again, 1.4 runs fine for me, now, but I'm still struggling for the correct delete query. There is few to no documentation at all for the new special commands, and I have problems guessing the correct

Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)

2009-08-06 Thread Chantal Ackermann
Great! *bow* Thanks, Chantal entity name=delete_from_index pk=GROUPID transformer=TemplateTransformer query=select GROUPID from DEFINITION where LANGUAGE='de' and CHANGED_DATE '${dataimporter.last_index_time}' field column=$deleteDocByQuery

Re: help getting started with spell check dictionary

2009-08-06 Thread Grant Ingersoll
I'm guessing it is because you have your Spell checker mapped to the spellchecker request handler, but you are asking the standard request handler to build the spell checker. Unless you've modified the Standard Req Handler, it is not spell check aware. Try

Preserving C++ and other weird tokens

2009-08-06 Thread Michael _
Hi everyone, I'm indexing several documents that contain words that the StandardTokenizer cannot detect as tokens. These are words like C# .NET C++ which are important for users to be able to search for, but get treated as C, NET, and C. How can I create a list of words that should be

Re: enablereplication does not work

2009-08-06 Thread solr jay
You are right. Replication was disabled after the server was restarted, and then I saw the behavior. After I added some data, command indexversion returns the right values. So it seems Solr behaved correctly. Thanks, 2009/8/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com how is the

RE: 99.9% uptime requirement

2009-08-06 Thread Brian Klippel
You could create a new working core, then call the swap command once it is ready. Then remove the work core and delete the appropriate index folder at your convenience. -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Wednesday, August 05, 2009 6:41 PM To:

Re: Issue with special charcters in solar search

2009-08-06 Thread Avlesh Singh
Something similar has been discussed earlier. Go through this thread - http://www.lucidimagination.com/search/document/b5977650557f50cb/problem_with_query_parser PS: Solr is pronounced as Solar but written without the a. Cheers Avlesh On Thu, Aug 6, 2009 at 7:18 PM, Deepak VSVK

RE: 99.9% uptime requirement

2009-08-06 Thread Robert Petersen
Here is another idea. With solr multicore you can dynamically spin up extra cores and bring them online. I'm not sure how well this would work for us since we have hard coded the names of the cores we are hitting in our config files. -Original Message- From: Brian Klippel

Re: 99.9% uptime requirement

2009-08-06 Thread Walter Underwood
Design so that you can handle the load with one server down (N+1 sizing), then take one server out for any maintenance. Simple and works fine. wunder On Aug 6, 2009, at 9:25 AM, Robert Petersen wrote: Here is another idea. With solr multicore you can dynamically spin up extra cores and

Re: mergeFactor / indexing speed

2009-08-06 Thread Chantal Ackermann
Hi all, to keep this thread up to date... ;-) d) jdbc batch size changed to 10. (Was default: 500, then 1000) The problem with my dih setup is that the root entity query returns a huge set (all ids that shall be indexed). A larger fetchsize would be good for that query. The nested entity,

Re: mergeFactor / indexing speed

2009-08-06 Thread Yonik Seeley
On Mon, Aug 3, 2009 at 12:32 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: avg-cpu:  %user   %nice    %sys %iowait   %idle           1.23    0.00    0.03    0.03   98.71 Basically, it is doing very little? *scratch* How often is commit being called? (a Lucene commit sync's all

Summing sub categories in faceting

2009-08-06 Thread Jón Helgi Jónsson
Hi, would really appreciate some help on this. I'm doing a category browser for companies. Kind of like a yellow pages. For each company I store each category the company is in like this: Example for Boeing would be 03.03.02 which is an fictional id for 'Jets' The beginning point I display all

Re: mergeFactor / indexing speed

2009-08-06 Thread Avlesh Singh
does DIH call commit periodically, or are things done in one big batch? AFAIK, one big batch. Cheers Avlesh On Thu, Aug 6, 2009 at 11:23 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Aug 3, 2009 at 12:32 PM, Chantal Ackermannchantal.ackerm...@btelligent.de wrote: avg-cpu:

Revisiting IDF Problems and Index Slices

2009-08-06 Thread Mark Bennett
I'm investigating a problem I bet some of you have hit before, and exploring several options to address it. I suspect that this specific IDF scenario is common enough that it even has a name, though I'm not what it would be called. The scenario: Suppose you have a search application focused on

Re: Revisiting IDF Problems and Index Slices

2009-08-06 Thread Otis Gospodnetic
As soon as I started reading your message I started thinking common grams, so that is what I would try first, esp. since somebody already did the work of porting that from Nutch to Solr (see Solr JIRA). Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch,

Language Detection for Analysis?

2009-08-06 Thread Bradford Stephens
Hey there, We're trying to add foreign language support into our new search engine -- languages like Arabic, Farsi, and Urdu (that don't work with standard analyzers). But our data source doesn't tell us which languages we're actually collecting -- we just get blocks of text. Has anyone here

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir
Bradford, there is an arabic analyzer in trunk. for farsi there is currently a patch available: http://issues.apache.org/jira/browse/LUCENE-1628 one option is not to detect languages at all. it could be hard for short queries due to the languages you mentioned borrowing from each other. but you

Item Facet

2009-08-06 Thread David Lojudice Sobrinho
Hi... Is there any way to group values like shopping.yahoo.com or shopper.cnet.com do? For instance, I have documents like: doc1 - product_name1 - value1 doc2 - product_name1 - value2 doc3 - product_name1 - value3 doc4 - product_name2 - value4 doc5 - product_name2 - value5 doc6 - product_name2

Re: Limit of Index size per machine..

2009-08-06 Thread Tom Burton-West
Hello, I think you are confusing the size of the data you want to index with the size of the index. For our indexes (large full text documents) the Solr index is about 1/3 of the size of the documents being indexed. For 3 TB of data you might have an index of 1 TB or less. This depends on

concurrent csv loading

2009-08-06 Thread Joe Calderon
for first time loads i currently post to /update/csv?commit=falseseparator=%09escape=\stream.file=workfile.txtmap=NULL:keepEmpty=false, this works well and finishes in about 20 minutes for my work load. this is mostly cpu bound, i have an 8 core box and it seems one takes the brunt of the work.

RE: Item Facet

2009-08-06 Thread Ge, Yao (Y.)
If you can reindex, simply rebuild the index with fields replaced by combining existing fields. -Yao -Original Message- From: David Lojudice Sobrinho [mailto:dalss...@gmail.com] Sent: Thursday, August 06, 2009 4:17 PM To: solr-user@lucene.apache.org Subject: Item Facet Hi... Is there

RE: concurrent csv loading

2009-08-06 Thread Smiley, David W.
You should stand to benefit from concurrent loading. Certainly the text analysis would end up being done concurrently; I'm not sure what else benefits from it but I think there are other things. Ideally you could try a configurable number of concurrent loads and pick the one that gets the job

Re: Summing sub categories in faceting

2009-08-06 Thread Jón Helgi Jónsson
Did a bit more creative searching for a solution and came up with this: http://www.mail-archive.com/solr-user@lucene.apache.org/msg15027.html I'm using couple of days old nightly build, so unless there is something new I should know about I'm going with that method :) 2009/8/6 Jón Helgi Jónsson

Re: enablereplication does not work

2009-08-06 Thread solr jay
By the way, I was using command=indexversion to verify replication is on or off. Since it seems not reliable, is there a better to do it? Thanks, On Thu, Aug 6, 2009 at 8:43 AM, solr jay solr...@gmail.com wrote: You are right. Replication was disabled after the server was restarted, and then

Re: Item Facet

2009-08-06 Thread David Lojudice Sobrinho
I can't reindex because the aggregated/grouped result should change as the query changes... in other words, the result must by dynamic We've been thinking about a new handler for it something like: /select?q=laptoprows=0itemfacet=onitemfacet.field=product_name,min(price),max(price) Does it

Re: Language Detection for Analysis?

2009-08-06 Thread Cheolgoo Kang
Is that 'blocks of text' is a (unicode) Java string? I don't think this is the case, but then, use Character.UnicodeBlock to identify the language of the text. And, is that just text files with unknown character encoding? Then ICU has a 'charset detector' that you can use. This feature 'suggests'

Re: Language Detection for Analysis?

2009-08-06 Thread Robert Muir
fyi, you can use the block property,but I think even better is to use the unicode script property: http://unicode.org/reports/tr24/ . This is easier because some characters are common across different scripts. Also, some scripts span multiple unicode blocks. This is the direction I was heading

Re: Language Detection for Analysis?

2009-08-06 Thread Lucas F. A. Teixeira
Google Translate just released (last week) its language API with translation and LANGUAGE DETECTION. :) It's very simple to use, and you can query it with some text to define witch language is it. Here is a simple example using groovy, but all you need is the url to query:

Re: Summing sub categories in faceting

2009-08-06 Thread Koji Sekiguchi
There is a patch for it: https://issues.apache.org/jira/browse/SOLR-64 Koji Jón Helgi Jónsson wrote: Did a bit more creative searching for a solution and came up with this: http://www.mail-archive.com/solr-user@lucene.apache.org/msg15027.html I'm using couple of days old nightly build, so

Attempt to query for max id failing with exception

2009-08-06 Thread Reuben Firmin
I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id in the index, I'm getting an exception. My setup code is: final SolrQuery params = new SolrQuery(); params.addSortField(id, ORDER.desc); params.setRows(1); params.setQuery(queryString);

Re: How to set solr/home in linux OS?

2009-08-06 Thread huenzhao
I have tried, but it was also not work! The goal to set solr.home in tomcat6 is to start solr when the tomcat6 is starting. So I think the problem is that the solr can not start by set the solr.home when glassfish is starting. Chantal Ackermann wrote: You have to quote values that

Re: Item Facet

2009-08-06 Thread Avlesh Singh
Dynamic fields might be an answer. If you had a field called product_* and these were populated with the corresponding values during indexing then faceting on these fields will give you the desired behavior. The only catch here is that the product names have to be known upfront. A wildcard

Re: How to set solr/home in linux OS?

2009-08-06 Thread Otis Gospodnetic
It looks like you export JAVA_OPTS in your .profile, but I bet Tomcat also sets and thus overrides this same JAVA_OPTS it its own start up script. So that is what you should edit and modify. I'm a Jetty user, so I don't have a Tomcat startup script to check for you. Otis -- Sematext is

Re: Multi tokenizer

2009-08-06 Thread Koji Sekiguchi
Chris Hostetter wrote: : I need to tokenize my field on whitespaces, html, punctuation, apostrophe : but if I use HTMLStripStandardTokenizerFactory it strips only html : but no apostrophes you might consider using one of the HTML Tokenizers, and then use a PatternReplaceFilterFilter ...

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Ninad Raut
Hi Noble, Can you explain a bit more on how to use Solr out of the box. I am looking at ways to design the UI for remote application quickly and with less problems. Also could you elaborate more on what can go wrong with the first option? Thanks. 2009/8/6 Noble Paul നോബിള്‍ नोब्ळ्

Re: Language Detection for Analysis?

2009-08-06 Thread Otis Gospodnetic
Bradford, If I may: Have a look at http://www.sematext.com/products/language-identifier/index.html And/or http://www.sematext.com/products/multilingual-indexer/index.html Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP,

Re: Attempt to query for max id failing with exception

2009-08-06 Thread Avlesh Singh
params.setQuery(queryString); The query string is *:*, right? Your id field is sortable, right? Cheers Avlesh On Fri, Aug 7, 2009 at 5:58 AM, Reuben Firmin reub...@benetech.org wrote: I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id in the index, I'm getting

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Walter Underwood
About the first option, caches are more effective with more traffic, so ten front end servers using three Solr servers will have better caching and probably better overall performance than having separate search on all ten servers. You can even put an HTTP cache in there and get better

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Ninad Raut
The remote web app will be accessing the Solr server via internet. Its not a intranet setup. On Fri, Aug 7, 2009 at 10:19 AM, Walter Underwood wun...@wunderwood.orgwrote: About the first option, caches are more effective with more traffic, so ten front end servers using three Solr servers will

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
The you should consider replicating the index to the local intranet and still run that it as a separate app. On Fri, Aug 7, 2009 at 10:53 AM, Ninad Rauthbase.user.ni...@gmail.com wrote: The remote web app will be accessing the Solr server via internet. Its not a intranet setup. On Fri, Aug 7,

Re: Transfer of Index Vs HTTP GET Vs Embedded Solr -- Urgent Help

2009-08-06 Thread Ninad Raut
The you should consider replicating the index to the local intranet and still run that it as a separate app. Will it be the same master-slave replication?? If the master is multicore, can I specifically replicate an index of a certain core ? Thanks for the help. 2009/8/7 Noble Paul നോബിള്‍

Re: How to set solr/home in linux OS?

2009-08-06 Thread Amit Nithian
Have you tried setting solr home via the JNDI? I think you can set it via solr/home but that would require adding this to your servlet context configuration. Another option is to trace the startup scripts for Glassfish and see what environment variables are passed in. JAVA_OPTS would make sense

Data loading from DB - data sizes and obstacles

2009-08-06 Thread Amit Nithian
All, An off and on project of mine has been to work on refactoring the way we load data from MySQL into Solr. Our current approach is fairly hard coded and not configurable as I would like. I was curious of people who have used the DIH and/or LuSQL to load data into Solr, how much data you