Re: a bug of solr distributed search

2010-07-26 Thread MitchK
Good morning, https://issues.apache.org/jira/browse/SOLR-1632 - Mitch Li Li wrote: where is the link of this patch? 2010/7/24 Yonik Seeley yo...@lucidimagination.com: On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote: why do we do not send the output of TermsComponent of

question about relevance

2010-07-26 Thread Bharat Jain
Hello All, I have a index which store multiple objects belonging to a user for e.g. schema field name=objType type=... / - Identifies user object type e.g. userBasic or userAdv !-- obj 1 -- field name=first_name type=... / MAPS to userBasicInfoObject field

Integration Problem

2010-07-26 Thread Jörg Wißmeier
Hi everybody, since a while i'm working with solr and i have integrated it with liferay 6.0.3. So every search request from liferay is processed by solr and its index. But i have to integrate another system, this system offers me a webservice. the results of these webservice should be in the

Re: help with a schema design problem

2010-07-26 Thread Chantal Ackermann
Hi, I haven't read everything thoroughly but have you considered creating fields for each of your (I think what you call) party value? So that you can query like client:Pramod. You would then be able to facet on client and supplier. Cheers, Chantal On Fri, 2010-07-23 at 23:23 +0200,

Re: how to Protect data

2010-07-26 Thread Peter Karich
Hi Girish, I am not aware of such a thing. But you could use a middleware to avoid certain fields from being retrieved via the 'fl' parameter: http://wiki.apache.org/solr/CommonQueryParameters#fl E.g. for your customers the query looks like q=hellofl=title and for your admin the query looks like

Re: schema.xml

2010-07-26 Thread Grijesh.singh
Hi , There is no required fields except u specify any fields to required.U can remove or add as many fields u want. That is an example schema which shows how feilds are configured -- View this message in context: http://lucene.472066.n3.nabble.com/schema-xml-tp995696p995800.html Sent from the

Solr Doc Lucene Doc !?

2010-07-26 Thread stockii
Hello. I write a little text about SOLR and LUCENE by using the DIH. what documents are creating and inserting DIH ? in wiki is the talk about solr documents but i thought that, solr uses lucene to do this and so that DIH creates Lucnee Documents, not Solr Documents !? what are doing the

Re: Solr Doc Lucene Doc !?

2010-07-26 Thread MitchK
Stockii, Solr's index is a Lucene Index. Therefore, Solr documents are Lucene documents. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p995968.html Sent from the Solr - User mailing list archive at Nabble.com.

DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-26 Thread Chantal Ackermann
Hi, my use case is the following: In a sub-entity I request rows from a database for an input list of strings: entity name=prog ... field name=vip ... /* multivalued, not required */ entity name=ssc_entry dataSource=ssc onError=continue query=select SSC_VALUE from

Re: Solr Doc Lucene Doc !?

2010-07-26 Thread stockii
... but in the code is the talk about of, SolrDocuments. these are higher level docs, used to construct the lucene doc to index ... !!?!?!?!? and in wiki is the talk about Build Solr documents by aggregating data from multiple columns and tables according to configuration

Can't find org.apache.solr.client.solrj.embedded

2010-07-26 Thread Uwe Reh
Hello experts, where is a Jar, containing org.apache.solr.client.solrj.embedded? I miss this package in 'apache-solr-solrj-1.4.[01].jar'. Also I can't find any other sources than http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/webapp/src/org/apache/solr/client/solrj/embedded/ , which

Problem with parsing date

2010-07-26 Thread Rafal Bluszcz Zawadzki
Hi, I am using Data Import Handler from Solr 1.4. Parts of my data-config.xml are: entity name=page processor=XPathEntityProcessor stream=false forEach=/multistatus/response url=/tmp/file.xml

Re: Problem with parsing date

2010-07-26 Thread Li Li
I uses format like -MM-ddThh:mm:ssZ. it works 2010/7/26 Rafal Bluszcz Zawadzki ra...@headnet.dk: Hi, I am using Data Import Handler from Solr 1.4. Parts of my data-config.xml are:        entity name=page                processor=XPathEntityProcessor                stream=false      

Re: Problem with Pdf, Sol 1.4.1 Cell

2010-07-26 Thread Tommaso Teofili
Hi, I think there is an open bug for it at: https://issues.apache.org/jira/browse/SOLR-1902 Using Solr 1.4.1 and upgrading Tika libraries to 0.8 snapshot I had also to upgrade pdfbox, fontbox and jembox to 1.2.1; I got no errors and it seems it's able to index PDFs without any errors (I can query

Re: Problem with parsing date

2010-07-26 Thread Rafal Bluszcz Zawadzki
I am using also others dateFormat string, also in same data handler and they works. But not this one. And this data are fetching from the external source, so I don't have possibility to modify them (well, theoritacly i can save them, edit etc but this is not the way). Why this is not working with

Re: 2 solr dataImport requests on a single core at the same time

2010-07-26 Thread kishan
Tq very Much .. -- View this message in context: http://lucene.472066.n3.nabble.com/2-solr-dataImport-requests-on-a-single-core-at-the-same-time-tp978649p996190.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: 2 solr dataImport requests on a single core at the same time

2010-07-26 Thread kishan
btw , i want to put all the requestHandlers(more than 1) in 1 xml file and i want to use this in my solrConfig.xml i have used xinclude but it didnt work .. please suggest me any thing Thanks, Prasad -- View this message in context:

Similar search regarding a result document

2010-07-26 Thread scrapy
Hi, I would like to implement a similar search feature... but not relative to the initial search query but relative to each resuts documents. The structure of each doc is: id title content price etc... Then we have a database of global search seach queries, i'm thinking to integrate this in

Re: Problem with parsing date

2010-07-26 Thread Chantal Ackermann
On Mon, 2010-07-26 at 14:46 +0200, Rafal Bluszcz Zawadzki wrote: EEE, d MMM HH:mm:ss z not sure but you might want to try with an uppercase 'Z' for the timezone (surrounded by single quotes, alternatively). The rest of your pattern looks fine. But if you still run into problems try

slave index is bigger than master index

2010-07-26 Thread Muneeb Ali
Hi, I am using Solr 1.4 version, with master-slave setup. We have one master slave and two slave servers. It was all working fine, but lately solr slaves are behaving strange. Particularly during replicating the index, the slave nodes die and always need a restart. Also the index size of slave

Re: slave index is bigger than master index

2010-07-26 Thread Tommaso Teofili
Hi, I think that you may be using a Lucene/Solr IndexDeletionPolicy that does not remove old commits (and you aren't propagating solr-config via replication). You can configre this feature on the solr-config.xml inside the deletionPolicy tag: *deletionPolicy class=solr.SolrDeletionPolicy

AW: slave index is bigger than master index

2010-07-26 Thread Bastian Spitzer
Hi, are u calling optimize/ on the master to finally remove deleted documents and merge the index files? once a day is recommended: http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations cheers -Ursprüngliche Nachricht- Von: Muneeb Ali

spell checking....

2010-07-26 Thread satya swaroop
hi all, i am a new one to solr and able to implement indexing the documents by following the solr wiki. now i am trying to add the spellchecking. i followed the spellcheck component in wiki but not getting the suggested spellings. i first build it by spellcheck.build=true,... here i give

Re: AW: slave index is bigger than master index

2010-07-26 Thread Muneeb Ali
Yes I always run an optimize whenever I index on master. In fact I just ran an optimize command an hour ago, but it didn't make any difference. -- View this message in context: http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p996364.html Sent from the Solr -

Re: slave index is bigger than master index

2010-07-26 Thread Muneeb Ali
I just checked my config file, and I do have exact same values for deletionPolicy tag, as you attached in your email, so I dont really think it could be this. -- View this message in context: http://lucene.472066.n3.nabble.com/slave-index-is-bigger-than-master-index-tp996329p996373.html Sent

Re: Problem with parsing date

2010-07-26 Thread Rafal Bluszcz Zawadzki
I have just fixed it. Problem was related with operating system value - they were different that solr expected with incoming datastream. Regards, Rafal Zawadzki On Mon, Jul 26, 2010 at 3:20 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: On Mon, 2010-07-26 at 14:46 +0200, Rafal

Re: Solr Doc Lucene Doc !?

2010-07-26 Thread kenf_nc
DataImportHandler (DIH) is an add-on to Solr. It lets you import documents from a number of sources in a flexible way. The only connection DIH has to Lucene is that Solr uses Lucene as the index engine. When you work with Solr you naturally talk about Solr Documents, if you were working with

2 type of docs in same schema?

2010-07-26 Thread scrapy
I need you expertise on this one... We would like to index every search query that is passed in our solr engine (same core) Our docs format are like this (already in our schema): title content price category etc... Now how to add search queries as a field in our schema? Know that the

Re: DIH : SQL query (sub-entity) is executed although variable is not set (null or empty list)

2010-07-26 Thread MitchK
Hi Chantal, did you tried to write a http://wiki.apache.org/solr/DIHCustomFunctions custom DIH Function ? If not, I think this will be a solution. Just check, whether ${prog.vip} is an empty string or null. If so, you need to replace it with a value that never can response anything. So the

Re: slave index is bigger than master index

2010-07-26 Thread Peter Karich
did you try an optimize on the slave too? Yes I always run an optimize whenever I index on master. In fact I just ran an optimize command an hour ago, but it didn't make any difference.

Re: 2 type of docs in same schema?

2010-07-26 Thread Geert-Jan Brits
You can easily have different types of documents in 1 core: 1. define searchquery as a field(just as the others in your schema) 2. define type as a field (this allows you to decide which type of documents to search for, e.g: type_normal or type_search) now searching on regular docs becomes:

Re: slave index is bigger than master index

2010-07-26 Thread Muneeb Ali
No I didn't. I thought you aren't supposed to run optimize on slaves. Well but it doesn;t matter now, as I think its fixed now. I just added a dummy document on master, ran a commit call and then once that executed ran an optimize call. This triggered snapshooter to replicate the index, which

RE: slave index is bigger than master index

2010-07-26 Thread Bastian Spitzer
as far as i know this is not needed, the optimized index is automatically replicated to the slaves. therefore something seems to be really wrong with your setup. maybe the slave index got corrupted for some reason? did u try deleting the data dir + slave restart for a fresh replicated index?

Re: Solr Doc Lucene Doc !?

2010-07-26 Thread stockii
i want to learn more about the technology. exists an issue to create really an solrDoc ? Or its in the code only for a better understanding of the lucene and solr border ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Doc-Lucene-Doc-tp995922p99.html Sent from

Updating fields in Solr

2010-07-26 Thread Pramod Goyal
Hi, I have a requirement where i need to keep updating certain fields in the schema. My requirement is to change some of the fields or add some values to a field ( multi-value field ). I understand that i can use Solr update for this. If i am using Solr update do i need to publish the entire

Re: 2 type of docs in same schema?

2010-07-26 Thread scrapy
Thanks for you answer! That's great. Now to index search quieries data is there something special to do? or it stay as usual? -Original Message- From: Geert-Jan Brits gbr...@gmail.com To: solr-user@lucene.apache.org Sent: Mon, Jul 26, 2010 4:57 pm Subject: Re: 2 type of docs

Re: how to Protect data

2010-07-26 Thread Dennis Gearon
If it's not the data that's being searched, you can alway encode it before inserting it. You either have to either fruther encode it to base64 to make it printable before storing it, OR use a binary field. You probably could also set up an external process that cycles through every document in

Re:Re: How to speed up solr search speed

2010-07-26 Thread Dennis Gearon
Isn't it always one of these three? (from most likely to least likely, generally) Memory Disk Speed WebServer and it's code CPU. Memory and Disk are related, as swapping occurs between them. As long as memory is high enough, it becomes: Disk Speed WebServer and it's code CPU If the WebServer

Re: 2 type of docs in same schema?

2010-07-26 Thread Geert-Jan Brits
I still assume that what you mean by search queries data is just some other form of document (in this case containing 1 seach-request per document) I'm not sure what you intend to do by that actually, but yes indexing stays the same (you probably want to mark field type as required so you don't

Extracting PDF text/comment/callout/typewriter boxes with Solr CELL/Tika/PDFBox

2010-07-26 Thread Sharp, Jonathan
Every so often I need to index new batches of scanned PDFs and occasionally Adobe's OCR can't recognize the text in a couple of these documents. In these situations I would like to type in a small amount of text onto the document and have it be extracted by Solr CELL. Adobe Pro 9 has a

Re: Solr Doc Lucene Doc !?

2010-07-26 Thread Chris Hostetter
: i want to learn more about the technology. : : exists an issue to create really an solrDoc ? Or its in the code only for a : better understanding of the lucene and solr border ? There is a real and actual class named SolrDocument. it is a simpler object then Lucene's Document class becuase

Re: Can't find org.apache.solr.client.solrj.embedded

2010-07-26 Thread Chris Hostetter
: where is a Jar, containing org.apache.solr.client.solrj.embedded? Classes in the embedded package are useless w/o the rest of the Solr internal core classes, so they are included directly in the apache-solr-core-1.4.1.jar. (i know .. the directory structure doesn't make a lot of sense) :

Re: slave index is bigger than master index

2010-07-26 Thread Chris Hostetter
: No I didn't. I thought you aren't supposed to run optimize on slaves. Well correct, you should make all changes to the master. : but it doesn;t matter now, as I think its fixed now. I just added a dummy : document on master, ran a commit call and then once that executed ran an : optimize

Re: Solr Doc Lucene Doc !?

2010-07-26 Thread stockii
ah okay thx =) the class SolrInputDocuments is only for indexing an document and SolrDocuement for the search ? when Solr index an document first step is to create an SolrInputDocument. then in class DocumentBuilder creates solr in function Document toDocument (SolrInputDoc, Schema) an Lucene

Solr 3.1 and ExtractingRequestHandler resulting in blank content

2010-07-26 Thread David Thibault
Hello all, I’m working on a project with Solr. I had 1.4.1 working OK using ExtractingRequestHandler except that it was crashing on some PDFs. I noticed that Tika bundled with 1.4.1 was 0.4, which was kind of old. I decided to try updating to 0.7 as per the directions here:

How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?

2010-07-26 Thread Savannah Beckett
I am using Drupal ApacheSolr module to integrate solr with drupal.  I already integrated solr with nutch.  I already moved nutch's solrconfig.xml and schema.xml to solr's example directory, and it work.  I tried to append Drupal's ApacheSolr module's own solrconfig.xml and schema.xml into the

Re: How to Combine Drupal solrconfig.xml with Nutch solrconfig.xml?

2010-07-26 Thread David Stuart
Hi Savannah, I have just answered this question over on drupal.org. http://drupal.org/node/811062 Response number 5 and 11 will help you. On the solrconfig.xml side of things you will only really need Drupal's version. Although still in alpha my Nutch module will help you out with integration

NullPointerException with CURL, but not in browser

2010-07-26 Thread Rene Rath
Hi *, I'd like to see how many documents I have in my index with a certain ListId, in this example ListId 881. http://localhost:8983/solr/select?indent=onversion=2.2q=*fq=ListId%3A881start=0rows=0fl=*%2Cscoreqt=standardwt=standard In the browser, the output looks perfect, I indeed have 3

Total number of terms in an index?

2010-07-26 Thread Jason Rutherglen
What's the fastest way to obtain the total number of docs from the index? (The Luke request handler takes a long time to load so I'm looking for something else).

Re: Total number of terms in an index?

2010-07-26 Thread Jason Rutherglen
Sorry, like the subject, I mean the total number of terms. On Mon, Jul 26, 2010 at 4:03 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: What's the fastest way to obtain the total number of docs from the index?  (The Luke request handler takes a long time to load so I'm looking for

java GC overhead limit exceeded

2010-07-26 Thread Jonathan Rochkind
I am now occasionally getting a Java GC overhead limit exceeded error in my Solr. This may or may not be related to recently adding much better (and more) warming querries. I can get it when trying a 'commit', after deleting all documents in my index, or in other cases. Anyone run into

Re: NullPointerException with CURL, but not in browser

2010-07-26 Thread Chris Hostetter
: However, when I'm trying this very URL with curl within my (perl) script, I : receive a NullPointerException: : CURL-COMMAND: curl -sL : http://localhost:8983/solr/select?indent=onversion=2.2q=*fq=ListId%3A881start=0rows=0fl=*%2Cscoreqt=standardwt=standard it appears you aren't quoting the

Design questions/Schema Help

2010-07-26 Thread Mark
We are thinking about using Cassandra to store our search logs. Can someone point me in the right direction/lend some guidance on design? I am new to Cassandra and I am having trouble wrapping my head around some of these new concepts. My brain keeps wanting to go back to a RDBMS design. We

Re: Design questions/Schema Help

2010-07-26 Thread Mark
On 7/26/10 4:43 PM, Mark wrote: We are thinking about using Cassandra to store our search logs. Can someone point me in the right direction/lend some guidance on design? I am new to Cassandra and I am having trouble wrapping my head around some of these new concepts. My brain keeps wanting to

Re: Design questions/Schema Help

2010-07-26 Thread Tommy Chheng
Alternatively, have you considered storing(or i should say indexing) the search logs with Solr? This lets you text search across your search queries. You can perform time range queries with solr as well. @tommychheng Programmer and UC Irvine Graduate Student Find a great grad school based

Solr crawls during replication

2010-07-26 Thread Mark
We have an index around 25-30G w/ 1 master and 5 slaves. We perform replication every 30 mins. During replication the disk I/O obviously shoots up on the slaves to the point where all requests routed to that slave take a really long time... sometimes to the point of timing out. Is there any

Re: question about relevance

2010-07-26 Thread Erick Erickson
I'm having trouble getting my head around what you're trying to accomplish, so if this is off base you know why G. But what it smells like is that you're trying to do database-ish things in a SOLR index, which is almost always the wrong approach. Is there a way to index redundant data with each

Re: Similar search regarding a result document

2010-07-26 Thread Erick Erickson
I need much more detailed information before I can make sense of your use case. Could you provide some sample? MoreLikeThis sounds in the right neighborhood, but I'm guessing. Best Erick On Mon, Jul 26, 2010 at 9:02 AM, scr...@asia.com wrote: Hi, I would like to implement a similar search

Re: Total number of terms in an index?

2010-07-26 Thread Chris Hostetter
: Sorry, like the subject, I mean the total number of terms. it's not stored anywhere, so the only way to fetch it is to actually iteate all of the terms and count them (that's why LukeRequestHandler is slow slow to compute this particular value) If i remember right, someone mentioned at one

Re: Updating fields in Solr

2010-07-26 Thread Erick Erickson
See below: On Mon, Jul 26, 2010 at 11:49 AM, Pramod Goyal pramod.go...@gmail.comwrote: Hi, I have a requirement where i need to keep updating certain fields in the schema. My requirement is to change some of the fields or add some values to a field ( multi-value field ). I understand

Re: java GC overhead limit exceeded

2010-07-26 Thread Yonik Seeley
On Mon, Jul 26, 2010 at 7:17 PM, Jonathan Rochkind rochk...@jhu.edu wrote: I am now occasionally getting a Java GC overhead limit exceeded error in my Solr. This may or may not be related to recently adding much better (and more) warming querries. When memory gets tight, the JVM kicks of a

Is there a cache for a query?

2010-07-26 Thread Li Li
I want a cache to cache all result of a query(all steps including collapse, highlight and facet). I read http://wiki.apache.org/solr/SolrCaching, but can't find a global cache. Maybe I can use external cache to store key-value. Is there any one in solr?

RE: java GC overhead limit exceeded

2010-07-26 Thread Jonathan Rochkind
Short answer: GC overhead limit exceeded means out of memory. Aha, thanks. So the answer is just raise your Xmx/heap size, you need more memory to do what you're doing, yeah? Jonathan

StatsComponent and sint?

2010-07-26 Thread Jonathan Rochkind
Man, what types of fields is StatsComponent actually known to work with? With an sint, it seems to have trouble if there are any documents with null values for the field. It appears to decide that a null/empty/blank value is -1325166535, and is thus the minimum value. At least if I'm

Re: Design questions/Schema Help

2010-07-26 Thread Kiwi de coder
i think the search log will require a lot of storage which may make indexes size unreasonable large if store in solr. and the aggregration results may not really fixed in lucene index structure. :) kiwi happy hacking ! On Tue, Jul 27, 2010 at 7:47 AM, Tommy Chheng tommy.chh...@gmail.comwrote:

Querying throws java.util.ArrayList.RangeCheck

2010-07-26 Thread Manepalli, Kalyan
Hi, I am stuck at this weird problem during querying. While querying the solr index I am getting the following error. Index: 52, Size: 16 java.lang.IndexOutOfBoundsException: Index: 52, Size: 16 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at

Re: Querying throws java.util.ArrayList.RangeCheck

2010-07-26 Thread Yonik Seeley
Do you have any custom code, or is this stock solr (and which version, and what is the request)? -Yonik http://www.lucidimagination.com On Tue, Jul 27, 2010 at 12:30 AM, Manepalli, Kalyan kalyan.manepa...@orbitz.com wrote: Hi,   I am stuck at this weird problem during querying. While querying

Re: spell checking....

2010-07-26 Thread satya swaroop
This is in solrconfig.xml::: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namedefault/str str name=classnamesolr.IndexBasedSpellChecker/str str name=fieldspell/str str