Re: weird behabiour when setting negative boost with bq using dismax

2010-02-04 Thread Marc Sturlese
Generally speaking, by convention boosts in Lucene have unity at 1.0, not 0.0. So, a negative boost is usually done with boosts between 0 and 1. For this case, maybe a boost of 0.1 is what you want? I forgot to say I tried what you say aswell but didn't work. In the standard query parser,

How to send web pages(urls) to solr cell via solrj?

2010-02-04 Thread dhamu
Hi, I am newbie to solr and exploring solr last few days. I am using solr cell with tika for parsing, indexing and searching Posting the rich text documents via Solrj. My actual requirement is instead of using local documents(pdf, doc docx), i want to use webpages(urls for

Re: HTTP caching and distributed search

2010-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 3, 2010 at 12:21 AM, Charlie Jackson charlie.jack...@cision.com wrote: Currently, I've got a Solr setup in which we're distributing searches across two cores on a machine, say core1 and core2. I'm toying with the notion of enabling Solr's HTTP caching on our system, but I noticed

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
Thanks, but still no luck with that: *:* AND -fieldX:[* TO *] - returns 0 docs fieldX:(a*) - return docs, so I'm sure that there's docs with this field filled. Any other ideias what could be wrong? Frederico -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent:

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
I tried another one: fieldX:[ TO *] and it returns articles with the field filled :), so I guess I'm getting there. But I tried also fieldX:[ TO *] and get a few more results that the first one... Is there a real difference between these, and also if the results are really all docs with

RE: Solr response extremely slow

2010-02-04 Thread Fuad Efendi
'!' :))) Plus, FastLRUCache (previous one was synchronized) (and of course warming-up time) := start complains after ensuring there are no complains :) (and of course OS needs time to cache filesystem blocks, and Java HotSpot, ... - few minutes at least...) On Feb 3, 2010, at 1:38 PM, Rajat

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Mark Miller
Before you file a JIRA issue: I don't believe this is a bug, so there is likely no need for JIRA. Try putting the date.formats snipped in the defaults section rather than simply within the RequestHandler tags. Then you should be good to go. -- - Mark http://www.lucidimagination.com Lance

Re: query all filled field?

2010-02-04 Thread Mark Miller
Not entirely true - thats the case in Lucene, but in Solr, top level queries *can* start with minus or not. They cannot if they are nested. Both *:* AND -fieldX:[* TO *] and -fieldX:[* TO *] are the same in Solr. -- - Mark http://www.lucidimagination.com Lance Norskog wrote: Queries

RE: query all filled field?

2010-02-04 Thread Ahmet Arslan
*:* AND -fieldX:[* TO *] - returns 0 docs fieldX:(a*) - return docs, so I'm sure that there's docs with this field filled. Any other ideias what could be wrong? There is not wrong in this scenario. If -fieldX:[* TO *] returns 0 docs, it means that all of your documents have that fieldX

Solr Index size : Java out of memory

2010-02-04 Thread Smith G
Hello All, I am trying to start Solr server using Jetty ( same as in Solr tutorial in their website ). As the index size is around 3.5gb its returning OutOfMemoryError. Is it mandatory to satisfy the condition java heap size index size ? . If yes, is there any solution to run Solr

solr multicore and nfs

2010-02-04 Thread Valérie TAESCH
Hello, We are using Solr(v 1.3.0 694707 with Lucene version 2.4-dev 691741) in multicore mode with an average of 400 indexes (all indexes have the same structure). These indexes are stored on a nfs disk. A java process writes continuously in these indexes while solr is only used to read

RE: query all filled field?

2010-02-04 Thread Ankit Bhatnagar
That's correct. If u want to find Missing Values ie fields for whom value is not present then u will use - Ankit -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Thursday, February 04, 2010 9:41 AM To: solr-user@lucene.apache.org Subject: RE: query all filled

Re: Best OCR API for solr

2010-02-04 Thread mike anderson
There might be an OCR plugin for Apache Tika (which does exactly this out of the box except for OCR capability, i believe). http://lucene.apache.org/tika/ -mike 2010/2/4 Kranti™ K K Parisa kranti.par...@gmail.com Hi, Can anyone list the best OCR APIs available to use in combination with

ExtractingRequestHandler multiple values encountered for non multiValued field last_modified

2010-02-04 Thread Christoph Brill
Hi list, I'm using the ExtractingRequestHandler to extract content from documents. It's extracting the last_modified field quite fine, but of course only for documents where this field is set. If this field is not set I want to pass the file system timestamp of the file. I'm doing: final

Solr not starting JMX

2010-02-04 Thread Jan-Simon Winkelmann
Hi everyone, I am currently trying to set up JMX support for Solr, but somehow the listening socket is not even created on my specified port. My parameters look like this (running the Solr example): java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Christoph Brill
Cool, this way it's no longer crashing. Thanks and Regards, Chris Am 04.02.2010 14:29, schrieb Mark Miller: Before you file a JIRA issue: I don't believe this is a bug, so there is likely no need for JIRA. Try putting the date.formats snipped in the defaults section rather than simply

Storing values in addition to last_index_time

2010-02-04 Thread cjkadakia
I understand that upon performing an index (full-import or delta-import), the dataimport.properties file is written to with a last_index_time which can then be accessed by the data-config.xml for delta-import queries with ${dataimporter.last_index_time}. I was curious if another key could be

Some questions on solr replication backup feature

2010-02-04 Thread Licinio Fernández Maurelo
Hi folks, as we're moving to solr 1.4 replication, i want to know about backups. Questions - 1. Properties that can be set to configure this feature (only know backupAfter) 2. Is it an incremental backup or a full index snapshoot? Thx -- Lici ~Java Developer~

Re: Best OCR API for solr

2010-02-04 Thread Kranti™ K K Parisa
yes tika indexes all formats. but i am specifically looking for OCR (thru java) atleast for PDF or JPEG images any clues? Best Regards, Kranti K K Parisa On Thu, Feb 4, 2010 at 8:29 PM, mike anderson saidthero...@gmail.comwrote: There might be an OCR plugin for Apache Tika (which does

Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource

2010-02-04 Thread Jorg Heymans
Hi, I'm having some troubles getting this to work on a snapshot from 3rd feb My config looks as follows dataSource name=ora driver=oracle.jdbc.OracleDriver url= / datasource name=orablob type=FieldStreamDataSource / document name=mydoc entity dataSource=ora name=meta

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Mark Miller
Christoph Brill wrote: Cool, this way it's no longer crashing. Thanks and Regards, Chris Am 04.02.2010 14:29, schrieb Mark Miller: Before you file a JIRA issue: I don't believe this is a bug, so there is likely no need for JIRA. Try putting the date.formats snipped in the defaults

Re: ContentStreamUpdateRequest addFile fails to close Stream

2010-02-04 Thread Christoph Brill
Good job Mark, works fine and does not keep my files open. Thanks, Chris Am 03.02.2010 15:24, schrieb Mark Miller: Hey Christoph, Could you give the patch at https://issues.apache.org/jira/browse/SOLR-1744 a try and let me know how it works out for you?

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Mark Miller
Mark Miller wrote: Christoph Brill wrote: Cool, this way it's no longer crashing. Thanks and Regards, Chris Am 04.02.2010 14:29, schrieb Mark Miller: Before you file a JIRA issue: I don't believe this is a bug, so there is likely no need for JIRA. Try putting the

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
Theoretically yes,it's correct, but i have about 1/10 of the docs with this field not empty and the rest is empty. Most of the articles have the field empty as I can see when query *:*. So the queries don't make sense... -Original Message- From: Ankit Bhatnagar

Re: ClassCastException setting date.formats in ExtractingRequestHandler

2010-02-04 Thread Christoph Brill
Looks like it works. No crashes and the logs states it was added. I didn't test against acutal data, though. 04.02.2010 17:14:13 org.apache.solr.handler.extraction.ExtractingRequestHandler inform INFO: Adding Date Format: -MM-dd HH:mm:ss 04.02.2010 17:14:13

Using + with Stopwords

2010-02-04 Thread Asim Rahman
Hi, I have some common stopwords defined like [a,the,of] etc. Our users need the ability to include stopwords in their search. I tried using + sign like, [Bank +of America] to get accurate results, but it does not work. Does any body know how to provide this ability to search for stopwords - we

Is it posible to exclude results from other languages?

2010-02-04 Thread Raimon Bosch
Hi, In our indexes, sometimes we have some documents written in other languages different to the most common index's language. Is there any way to give less boosting to this documents? Thanks in advance, Raimon Bosch. -- View this message in context:

Re: Using + with Stopwords

2010-02-04 Thread Ahmet Arslan
Hi, I have some common stopwords defined like [a,the,of] etc. Our users need the ability to include stopwords in their search. I tried using + sign like, [Bank +of America] to get accurate results, but it does not work. Does any body know how to provide this ability to search for

Gathering metrics on 1.4 (was Re: Solr 1.4 - stats page slow)

2010-02-04 Thread john allspaw
Heya - So we just upgraded our Solr install to 1.4, and there's a great CPU drop and query response time drop. Good! But we're seeing the slowdown in the collection of statistics (stats.jsp) mentioned here: http://www.mail-archive.com/solr-user@lucene.apache.org/msg30224.html to the tune of

Re: Some questions on solr replication backup feature

2010-02-04 Thread Licinio Fernández Maurelo
I've made a backup request to my local solr server, it works but .. can i set snapshoots dir path? El 4 de febrero de 2010 16:54, Licinio Fernández Maurelo licinio.fernan...@gmail.com escribió: Hi folks, as we're moving to solr 1.4 replication, i want to know about backups. Questions

Re: Gathering metrics on 1.4 (was Re: Solr 1.4 - stats page slow)

2010-02-04 Thread Mark Miller
john allspaw wrote: Heya - So we just upgraded our Solr install to 1.4, and there's a great CPU drop and query response time drop. Good! But we're seeing the slowdown in the collection of statistics (stats.jsp) mentioned here:

Re: Is it posible to exclude results from other languages?

2010-02-04 Thread Ahmet Arslan
In our indexes, sometimes we have some documents written in other languages different to the most common index's language. Is there any way to give less boosting to this documents? If you are aware of those documents, at index time you can boost those documents with a value less than 1.0:

Re: Using + with Stopwords

2010-02-04 Thread Ahmet Arslan
Does any body know how to provide this ability to search for stopwords CommonGramsFilterFactory [1] may help. Sorry, Solr 1.4 has this filter.

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
XML update. I'm serializing the doc in .NET, and then using solsharp to insert/update the doc to SOLR. The result is: doc str name=fieldX/ /doc Dows this means I'm adding a whitespace on XML Update? Frederico -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com]

RE: query all filled field?

2010-02-04 Thread Ahmet Arslan
XML update. I'm serializing the doc in .NET, and then using solsharp to insert/update the doc to SOLR. The result is: doc     str name=fieldX/ /doc Dows this means I'm adding a whitespace on XML Update? Yes exactly. You can remove field name=fieldX /field from your add doc ...

RE: Guidance on Solr errors

2010-02-04 Thread Vauthrin, Laurent
Thank you for the responses! -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Wednesday, February 03, 2010 1:56 PM To: solr-user@lucene.apache.org Subject: Re: Guidance on Solr errors Inline below. On Feb 2, 2010, at 8:40 PM,

Re: query all filled field?

2010-02-04 Thread Erik Hatcher
On Feb 4, 2010, at 12:38 AM, Lance Norskog wrote: Queries that start with minus or NOT don't work. You have to do this: *:* AND -fieldX:[* TO *] That's only true for subqueries. A purely negative single top-level clause works fine with Solr. Erik On Wed, Feb 3, 2010 at

RE: query all filled field?

2010-02-04 Thread Frederico Azeiteiro
I've analyzed my index application and checked the XML before executing the http request and the field it's empty: field name=fieldX / It should be empty on SOLR. Probably something in the way between my application (.NET) and the SOLR (Jetty on Ubuntu) adds the whitespace. Anyway, I'll try

Re: Is it posible to exclude results from other languages?

2010-02-04 Thread Raimon Bosch
Yes, It's true that we could do it in index time if we had a way to know. I was thinking in some solution in search time, maybe measuring the % of stopwords of each document. Normally, a document of another language won't have any stopword of its main language. If you know some external

fuzzy matching / configurable distance function?

2010-02-04 Thread Joe Calderon
is it possible to configure the distance formula used by fuzzy matching? i see there are other under the function query page under strdist but im wondering if they are applicable to fuzzy matching thx much --joe

RE: query all filled field?

2010-02-04 Thread Ahmet Arslan
I've analyzed my index application and checked the XML before executing the http request and the field it's empty: field name=fieldX / It should be empty on SOLR. Probably something in the way between my application (.NET) and the SOLR (Jetty on Ubuntu) adds the whitespace.

Running Solr (LucidWorks) as a Windows Server

2010-02-04 Thread Roland Villemoes
Hi, I need to have Solr/Jetty running as a Windows Service. I am using the Lucid distribution. Does anyone have a running example and tool for this? med venlig hilsen/best regards Roland Villemoes Tel: (+45) 22 69 59 62 E-Mail: mailto:r...@alpha-solutions.dk Alpha Solutions A/S Borgergade 2,

RE: fuzzy matching / configurable distance function?

2010-02-04 Thread Fuad Efendi
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1 and 3.0... There are samples of other distance in contrib folder If you want to play with distance, check http://issues.apache.org/jira/browse/LUCENE-2230 It works if distance is integer and follows metric space axioms:

Indexing CSV without HTTP

2010-02-04 Thread Rohit Gandhe
Hi Everyone, We are indexing quite a lot of data using update/csv handler. For reasons I can't get into right now, I can't implement a DIH since I can only access the DB using Stored Procs and stored proc support in DIH is not yet available. Indexing takes about 3 hours and I don't want to tax

Re: HTTP caching and distributed search

2010-02-04 Thread Chris Hostetter
: http://localhost:8080/solr/core1/select/?q=googlestart=0rows=10shards : =localhost:8080/solr/core1,localhost:8080/solr/core2 : You are right, etag is calculated using the searcher on core1 only and it : does not take other shards into account. Can you open a Jira issue? ...as a possible

Re: Indexing CSV without HTTP

2010-02-04 Thread Yonik Seeley
On Thu, Feb 4, 2010 at 3:03 PM, Rohit Gandhe rohit.gan...@gmail.com wrote: We are indexing quite a lot of data using update/csv handler. For reasons I can't get into right now, I can't implement a DIH since I can only access the DB using Stored Procs and stored proc support in DIH is not yet

Re: Indexing CSV without HTTP

2010-02-04 Thread Rohit Gandhe
Thanks Yonik! We want to go to Index replication soon (couple of months), which will also help with incremental updates. But for now we want a quick and dirty solution without running two servers. Does the utility look ok to index a CSV file? Is it safe to do in production environment? I know

Re: Running Solr (LucidWorks) as a Windows Server

2010-02-04 Thread Erik Hatcher
What about using Tomcat instead? Tomcat has Windows service capability already, right? Erik On Feb 4, 2010, at 2:18 PM, Roland Villemoes wrote: Hi, I need to have Solr/Jetty running as a Windows Service. I am using the Lucid distribution. Does anyone have a running example and

Re: Running Solr (LucidWorks) as a Windows Server

2010-02-04 Thread Yonik Seeley
On Thu, Feb 4, 2010 at 4:42 PM, Erik Hatcher erik.hatc...@gmail.com wrote: What about using Tomcat instead?   Tomcat has Windows service capability already, right? Another part of the problem is telling the solr webapp where it's solr home is. Options: - use a tomcat context fragment

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Transferred partially to solr-user... Steven, thanks for the reply! I wonder if PatternReplaceFilter can output multiple tokens? I'd like to progressively strip the non-alphanums, for example output: apple!* apple! apple! apple On Thu, Feb 4, 2010 at 12:18 PM, Steven A Rowe sar...@syr.edu

Re: Solr Index size : Java out of memory

2010-02-04 Thread Lance Norskog
Solr needs memory allocation for different operations, not for the index size. It needs X amount of memory for a query, Y amount of memory for document found by a query, and other things. Sorting needs memory for the number of documents. Faceting needs memory for the number of unique values in a

Re: Solr not starting JMX

2010-02-04 Thread Chris Hostetter
: My parameters look like this (running the Solr example): : : java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=6060 : -Dcom.sun.management.jmxremote.authenticate=false : -Dcom.sun.management.jmxremote.ssl=false -jar start.jar What implementation/version of java are you

Re: Solr not starting JMX

2010-02-04 Thread Walter Underwood
I remember that I had to have a JMX password file with the right permissions, or it wouldn't start. --wunder On Feb 4, 2010, at 2:27 PM, Chris Hostetter wrote: : My parameters look like this (running the Solr example): : : java -Dcom.sun.management.jmxremote

Thanks Robert!

2010-02-04 Thread Jason Rutherglen
Robert, thanks for redoing all the Solr analyzers to the new API! It helps to have many examples to work from, best practices so to speak.

Re: Analyzer for stripping non alpha-numeric characters?

2010-02-04 Thread Jason Rutherglen
Answering my own question... PatternReplaceFilter doesn't output multiple tokens... Which means messing with capture state... On Thu, Feb 4, 2010 at 2:16 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Transferred partially to solr-user... Steven, thanks for the reply! I wonder if

Re: Using solr to store data

2010-02-04 Thread Tim Underwood
We just switched over to storing our data directly in Solr as compressed JSON fields at http://frugalmechanic.com. So far it's working out great. Our detail pages (e.g.: http://frugalmechanic.com/auto-part/817453-33-2084-kn-high-performance-air-filter) now make a single Solr request to grab the

Re: ExtractingRequestHandler multiple values encountered for non multiValued field last_modified

2010-02-04 Thread Lance Norskog
The Tika integration with the DataImportHandler allows you to control many aspects of what goes into the index, including solving this problem: http://wiki.apache.org/solr/TikaEntityProcessor (Tika is the extraction library, and ExtractingRequestHandler and the TikaEntityProcessor both use it.)

Re: weird behabiour when setting negative boost with bq using dismax

2010-02-04 Thread Marc Sturlese
: bq=(*:* -field_a:54^1) I think what you want there is bq=(*:* -field_a:54)^1 ...you are boosting things that don't match field_a:54 Thanks Hoss. I've updated the Wiki, the content of the bq param was wrong:

source tree for lucene

2010-02-04 Thread Joe Calderon
i want to recompile lucene with http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure which source tree to use, i tried using the implied trunk revision from the admin/system page but solr fails to build with the generated jars, even if i exclude the patches from 2230... im wondering

How to return filtered tokens as query results?

2010-02-04 Thread Gregg Horan
Is there a way to return Solr's analyzed/filtered tokens from a query, rather than the original indexed data? (Ideally at a fairly high level like solrj). Thanks

Filtering results

2010-02-04 Thread Abin Mathew
Hi I want to add a filter to my query which takes documents whose city field has either Bangalore of cochin or Bombay. how do i do this? fq=city:bangalorefq=city:bombay fq=city:cochin will take the intersection. I need the union. Please help Thanks

Re: Filtering results

2010-02-04 Thread Ahmet Arslan
Hi I want to add a filter to my query which takes documents whose city field has either Bangalore of cochin or Bombay. how do i do this? fq=city:bangalorefq=city:bombay fq=city:cochin will take the intersection. I need the union. fq=city:(bangalore OR cochin OR bombay) same syntax as