Re: Solr Text Tagger | All tags in desc order

2019-10-04 Thread Simon Rosenthal
-Simon On Fri, Oct 4, 2019 at 5:41 AM Vipul Sharma wrote: > Hi All, > > After putting all the master data in Solr Text Tagger, I want to parse > resume text to fetch the top five skills based on there score is there any > way to fetch the result in descending order? >

Re: SolrClient from inside processAdd function

2019-09-04 Thread Simon Rosenthal
Similarly, I had considered a URP which would call the Solr Tagger to add new metadata fields for indexing to incoming documents (and recall discussing this with David Smiley), but eventually decided against this approach on the grounds of complexity. -Simon On Wed, Sep 4, 2019 at 2:10 PM

Re: checksum failed (hardware problem?)

2018-09-26 Thread simon
the problem Eventually I cloned our environment to a new AWS instance, which proved to be the solution. Why, I have no idea... -Simon On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar wrote: > Got it. I'll have first hardware folks check and if they don't see/find > anything suspicious then i'll retur

Solr edismax multi-word match issue

2018-09-20 Thread Simon Bloch
nam)^10.0 | (ancestor_name:viet nam)^1.25 | (name:viet nam)^1.0) #### I would really appreciate any support or debugging advice in this matter! -Simon Bloch

Re: Sorting and pagination in Solr json range facet

2018-07-11 Thread simon
Looking carefully at the documentation for JSON facets, it looks as though the offset parameter is not supported for range facets, only for term facets. You'd have to do pagination in your application. -Simon On Tue, Jul 10, 2018 at 11:45 AM, Anil wrote: > HI Eric, > > i mean p

Re: CURL command problem on Solr

2018-05-29 Thread simon
Could it be that the header should be 'Content-Type' (which is what I see in the relevant RFC) rather than 'Content-type' as shown in your email ? I don't know if headers are case-sensitive, but it's worth checking. -Simon On Tue, May 29, 2018 at 11:02 AM, Roee Tarab wrote: > Hi , >

Re: Defining Document Transformers in Solr Configuration

2018-02-28 Thread simon
Thanks Mikhail: I considered that, but not all queries would request that field, and there are in fact a couple more similar DocTransformer-generated aliased fields which we can optionally request, so it's not a general enough solution. -Simon On Wed, Feb 28, 2018 at 1:18 AM, Mikhail Khludnev

Re: Defining Document Transformers in Solr Configuration

2018-02-27 Thread simon
...' in the request and have Solr do the expansion. > > Is there some way to do this that I've overlooked ? if not, I think it > would be a useful new feature. > > > -Simon > > >

Defining Document Transformers in Solr Configuration

2018-02-27 Thread simon
could supply 'fl='a,b,c,%numcites%,...' in the request and have Solr do the expansion. Is there some way to do this that I've overlooked ? if not, I think it would be a useful new feature. -Simon

Re: Solr search word NOT followed by another word

2018-02-12 Thread simon
Tim: How up to date is the Solr-5410 patch/zip in JIRA ?. Looking to use the Span Query parser in 6.5.1, migrating to 7.x sometime soon. Would love to see these committed ! -Simon On Mon, Feb 12, 2018 at 10:41 AM, Allison, Timothy B. <talli...@mitre.org> wrote: > That

Re: use mutiple ssd in solr cloud

2017-11-07 Thread simon
. best -Simon On Tue, Nov 7, 2017 at 1:44 AM, Amin Raeiszadeh <amin24march1...@gmail.com> wrote: > Hi > i want to use more than one ssd in each server of solr cluster but i don't > know how to set multiple hdd in solr.xml configurations. > i set on hdd path in solr.xml by: > /me

Re: Upgrade path from 5.4.1

2017-11-02 Thread simon
though see SOLR-11078 , which is reporting significant query slowdowns after converting *Trie to *Point fields in 7.1, compared with 6.4.2 On Wed, Nov 1, 2017 at 9:06 PM, Yonik Seeley wrote: > On Wed, Nov 1, 2017 at 2:36 PM, Erick Erickson > wrote:

Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
, and you could live with dropping the offending document(s) then you might want to investigate the TolerantUpdateProcessorFactory Solr 6.1 or later) -Simon On Thu, Sep 14, 2017 at 3:56 PM, arnoldbronley <arnold.bron...@gmail.com> wrote: > Thanks for information. Here is the full stack trace.

Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
@Arnold: are these non UTF-8 control characters (which is what the Nutch issue was about) or otherwise legal UTF-8 characters which Solr for some reason is choking on ? If you could provide a full stack trace it would be really helpful. On Thu, Sep 14, 2017 at 2:55 PM, Markus Jelsma

Re: How to remove control characters in stored value at Solr side

2017-09-14 Thread simon
might work for this. best -Simon On Thu, Sep 14, 2017 at 1:46 PM, Arnold Bronley <arnoldbron...@gmail.com> wrote: > I know I can apply PatternReplaceFilterFactory to remove control characters > from indexed value. However, is it possible to do similar thing for stored > value?

Re: How Solr knows the Cores it has on startup?

2017-09-12 Thread simon
is deleted in current versions of Solr - so you'll have to find a way (outside Solr) to copy it or re-create it. What is the use case here ? best -Simon On Tue, Sep 12, 2017 at 1:27 PM, Shashank Pedamallu <spedama...@vmware.com> wrote: > Hi, > > I wanted to know how does Solr pick up c

Re: Phrase Exact Match with Margin of Error

2017-06-15 Thread simon
with multiple tokens. Then construct a query which searches both field1 for an exact match, and field2 using ComplexQueryParser (use the localparams syntax) to combine them. Boost the field1 (exact match). HTH -Simon On Thu, Jun 15, 2017 at 1:20 PM, Max Bridgewater <max.bridgewa...@gmail.com>

Re: Solr 6.6 UNLOAD core broken?

2017-06-09 Thread simon
like a bug. -Simon On Fri, Jun 9, 2017 at 5:14 AM, Andreas Hubold <andreas.hub...@coremedia.com > wrote: > Hi, > > I just tried to update from Solr 6.5.1 to Solr 6.6.0 and observed a > changed behaviour with regard to unloading cores in Solr standalone mode. > > Afte

Re: SOLR | De-Duplication | Remove duplicate records based on their status

2017-05-31 Thread simon
Your updateRequestProcessorChain config snippet specifies the "id" field to generate a signature, but the sample data doesn't contain an "id" field ... check that out first. -Simon On Wed, May 31, 2017 at 12:06 PM, Lebin Sebastian <le...@codetheory.io> wrote: >

Re: Indexing I/O errors and CorruptIndex messages

2017-05-04 Thread simon
scripts running concurrently, but the duration goes up proportionately. -Simon On Thu, Apr 27, 2017 at 9:26 AM, simon <mtnes...@gmail.com> wrote: > Nope ... huge file system (600gb) only 50% full, and a complete index > would be 80gb max. > > On Wed, Apr 26, 2017 at 4:04

Re: Reload an unloaded core

2017-05-02 Thread simon
-Simon On Tue, May 2, 2017 at 4:04 PM, Erick Erickson <erickerick...@gmail.com> wrote: > IIRC, the core.properties file _is_ renamed to > core.properties.unloaded or something like that. > > Yeah, this is something of a pain. The inverse of "unload" is "create&qu

Re: Reload an unloaded core

2017-05-02 Thread simon
I ran into the exact same situation recently. I unloaded from the browser GUI which does not delete the data or instance dirs, but does delete core.properties. I couldn't find any API either so I eventually manually recreated core.properties and restarted Solr. Would be nice if the

Re: Indexing I/O errors and CorruptIndex messages

2017-04-27 Thread simon
k full issue will be transient, IOW > if you look now and have free space it still may have been all used up > but had some space reclaimed. > > Best, > Erick > > On Wed, Apr 26, 2017 at 12:02 PM, simon <mtnes...@gmail.com> wrote: > > reposting this as the proble

Indexing I/O errors and CorruptIndex messages

2017-04-26 Thread simon
reposting this as the problem described is happening again and there were no responses to the original email. Anyone ? I'm seeing an odd error during indexing for which I can't find any reason. The relevant solr log entry: 2017-03-24 19:09:35.363 ERROR

Re: keywords not found - google like feature

2017-04-13 Thread simon
will return a boolean if the term is in a specific field. I've used this for simple cases where it worked well, though I wouldn't like to speculate on how well this scales if you have an edismax query where you might need to generate multiple term/field combinations. HTH -Simon On Thu, Apr 13, 2017

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-28 Thread simon
with no need for actual highlighting. The patch is pretty old - I applied it to Solr 4.10 I think, so will probably need some work for later releases. HTH -Simon On Tue, Mar 28, 2017 at 4:59 AM, forest_soup <tanglin0...@gmail.com> wrote: > Thanks Eric. > > Actually solr highlightin

Unexplainable indexing i/o errors

2017-03-27 Thread simon
t see any evidence of hardware errors I'm puzzled as to why this would start happening out of the blue and I can't find any partiuclarly relevant posts to this forum or Stackexchange. Anyone have an idea what's going on ? -Simon

Re: Highlighting, offsets -- external doc store

2016-11-29 Thread simon
You might want to take a look at https://issues.apache.org/jira/browse/SOLR-4722 ( 'highlighter which generates a list of query term positions'). We used it a while back and doesn't appear to have been used in any Solr > 4.10) -Simon On Tue, Nov 29, 2016 at 11:43 AM, John Bickerstaff

Re: Can Solr find related terms in a document

2016-10-17 Thread simon
Do you already have a set of terms for which you would want to find out their co-occurence, or are you trying to do data mining, looking in a collection for terms which occur together more often than by chance ? On Sun, Oct 16, 2016 at 3:45 AM, Yangrui Guo wrote: > Hello

Solr suddenly starts creating .cfs (compound) segments during indexing

2016-09-27 Thread simon
logs --module=http solrconfig.xml: basically the default with some minor tweaks in the indexConfig section 5.0 200 1 20 60 20 ... everything else is default Insights as to why this is happening would be welcome. -Simon

Re: Metadata and HTML ending up in searchable text

2016-06-02 Thread Simon Blandford
ems some Javascript creeps into the text version. (See below) Regards, Simon HTML mode sample: 051?xml version="1.0" encoding="UTF-8"? html xmlns="http://www.w3.org/1999/xhtml"; head link rel="stylesheet" type="text/css" char

Re: Metadata and HTML ending up in searchable text

2016-06-01 Thread Simon Blandford
Thanks Timothy, Will give the DIH a try. I have submitted a bug report. Regards, Simon On 31/05/16 13:22, Allison, Timothy B. wrote: From the same page, extractFormat=text only applies when extractOnly is true, which just shows the output from tika without indexing the document. Y, sorry

Re: Metadata and HTML ending up in searchable text

2016-05-31 Thread Simon Blandford
. Regards, Simon On 27/05/16 20:22, Alexandre Rafalovitch wrote: I think Solr's layer above Tika was merging in metadata and text all together without a way (that I could see) to separate them. That's all I remember of my examination of this issue when I run into something similar. Not very helpful, I

Re: Metadata and HTML ending up in searchable text

2016-05-27 Thread Simon Blandford
uot;extractOnly" mode resulting in a XML output. The difference between selecting "text" or "xml" format is that the escaped document in the tag is either the original HTML (xml mode) or stripped HTML (text mode). It seems some Javascript creeps into the text version.

Metadata and HTML ending up in searchable text

2016-05-26 Thread Simon Blandford
Hi, I am using Solr 6.0 on Ubuntu 14.04. I am ending up with loads of junk in the text body. It starts like, The JSON entry output of a search result shows the indexed text starting with... body_txt_en: " stream_size 36499 X-Parsed-By org.apache.tika.parser.DefaultParser X-Parsed-By"

Re: fl=value equals?

2015-11-13 Thread simon
Please do push your script to github - I (re)-compile custom code infrequently and never remember how to setup the environment. On Thu, Nov 12, 2015 at 5:14 AM, Upayavira wrote: > Okay, makes sense. As to your question - making a new ValueSourceParser > that handles 'equals'

Re: OpenNLP plugin or similar NER software for Solr ??? !!!

2015-11-09 Thread simon
https://github.com/OpenSextant/SolrTextTagger/ We're using it for country tagging successfully. On Wed, Nov 4, 2015 at 3:10 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > David Smiley had a place name and general tagging engine that for the life > of me I can't find. > > It

Re: Detect term occurrences

2015-09-11 Thread simon
it is ingested into our main Solr collection. How many documents/product leaflets do you have ? The tagger is very fast at the Solr level but I'm seeing quite a bit of HTTP overhead. best -Simon On Fri, Sep 11, 2015 at 1:39 PM, Sujit Pal <sujit@comcast.net> wrote: > Hi Francisco, >

Re: how to index document with multiple words (phrases) and words permutation?

2015-08-25 Thread simon
been using with some success for this task. best -Simon On Mon, Aug 24, 2015 at 2:13 PM, afrooz afr.rahm...@gmail.com wrote: Thanks Erick, I will explain the detail scenario so you might give me a solution: I want to annotate a medical document base on only medical dictionary. I don't need

Re: Solr Matched Terms

2015-08-18 Thread simon
Check out https://issues.apache.org/jira/browse/SOLR-4722, which will return matching terms (and their offsets). Patch can be applied cleanly to Solr 4; doesn't appear to have been tried with Solr 5 -Simon On Tue, Aug 18, 2015 at 11:30 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Maybe

Custom Function for date reformatting

2015-06-12 Thread simon
place where a date format conversion is needed is proving painful indeed ;=( My thought is to write a custom function of the form datereformatter(date_field_name, format_string) but I thought I'd check if it's already been done or if someone can suggest a better approach. regards -Simon

How to trace error records during POST?

2015-04-07 Thread Simon Cheng
Good morning, I used Solr 4.7 to post 186,745 XML files and 186,622 files have been indexed. That means there are 123 XML files with errors. How can I trace what these files are? Thank you in advance, Simon Cheng.

Re: Alphanumeric Wild card search

2015-04-02 Thread Simon Martinelli
Hi, Have a look at the generated terms to see how they look. Simon On Thu, Apr 2, 2015 at 9:43 AM, Palagiri, Jayasankar jayashankar.palag...@honeywell.com wrote: Hello Team, Below is my field type fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100

solr.DictionaryCompoundWordTokenFilterFactory extracts words in string

2015-03-31 Thread Simon Martinelli
is compound of lindor and schlitten but i get lindor dorsch schlitten so the filter is extracting dorsch but the word before (lin) and after (litten) are not valid word parts. Is there any better compound word filter for German? Thanks, Simon

Re: Retrieving list of words for highlighting

2015-03-27 Thread simon
There's a JIRA ( https://issues.apache.org/jira/browse/SOLR-4722 ) describing a highlighter which returns term positions rather than snippets, which could then be mapped to the matching words in the indexed document (assuming that it's stored or that you have a copy elsewhere). -Simon On Wed

Creating a collection/core on HDFS with SolrCloud

2015-02-25 Thread Simon Minery
=solr.hdfs.security.kerberos.principalsolr/@CLUSTER.HADOOP/str and on Hadoop' core-site.xml, my hadoop.security.authentication parameter is set to Kerberos. Am I missing something ? Thank you very much for your input, have a great day. Simon M.

Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-18 Thread Simon Cheng
. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 17 February 2015 at 22:36, Simon Cheng simonwhch...@gmail.com wrote: Hi Alex, It's okay after I added in a new field s_title in the schema and re-indexed. field name=s_title type=string indexed=true

Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
[Press releases and articles on policy changes affecting the Singapore property market] / compiled by the Information Resource Centre, Monetary Authority of Singapore /str /doc doc str name=iddataq/str str name=title Simon is testing Solr - This one is in English. Color of the Wind. 我是中国人 , БOΛbШ OЙ

Re: Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
? Thanks again, Simon. On Wed, Feb 18, 2015 at 12:00 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: What's the field definition for your title field? Is it just string or are you doing some tokenizing? It should be a string or a single token cleaned up (e.g. lower-cased) using

Simple Sort Is Not Working In Solr 4.7?

2015-02-17 Thread Simon Cheng
Soros /str /doc doc str name=id15891/str arr name=author strSoros, George/str /arr str name=title The new paradigm for financial markets : the credit crisis of 2008 and what it means / George Soros /str /doc /result /response Thank you for the help in advance, Simon.

SASL with zkcli.sh

2015-02-12 Thread Simon Minery
you, Simon M.

Re: Suggester on Dynamic fields

2014-10-22 Thread Simon
completion. Thanks, Simon -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-on-Dynamic-fields-tp4165270p4165329.html Sent from the Solr - User mailing list archive at Nabble.com.

Variable date range facets and fixed range labels

2014-10-17 Thread Simon Fairey
Hi I'm trying to get solr (4.10) doing more of what it does best rather than a lot of hacking that is currently in our front end code, one area I'm trying to fix is date ranges, I have 2 types of date and want to display them in 2 different ways: dateA - blocks of 25 years, this works but

RE: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Simon Fairey
for these, are these what are using up the memory? Thanks Si -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 06 October 2014 16:56 To: solr-user@lucene.apache.org Subject: Re: Solr configuration, memory usage and MMapDirectory On 10/6/2014 9:24 AM, Simon Fairey

RE: Solr configuration, memory usage and MMapDirectory

2014-10-08 Thread Simon Fairey
- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 08 October 2014 21:09 To: solr-user@lucene.apache.org Subject: Re: Solr configuration, memory usage and MMapDirectory On 10/8/2014 4:02 AM, Simon Fairey wrote: I'm currently setting up jconsole but as I have to remotely monitor (no gui

Solr configuration, memory usage and MMapDirectory

2014-10-06 Thread Simon Fairey
Hi I've inherited a Solr config and am doing some sanity checks before making some updates, I'm concerned about the memory settings. System has 1 index in 2 shards split across 2 Ubuntu 64 bit nodes, each node has 32 CPU cores and 132GB RAM, we index around 500k files a day spread out over the

RE: Solr configuration, memory usage and MMapDirectory

2014-10-06 Thread Simon Fairey
: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Meanwhile, Shawn gave you some very good info so I won't repeat any On Mon, Oct 6, 2014 at 8:24 AM, Simon Fairey sifai...@gmail.com wrote: Hi I've inherited a Solr config and am doing some sanity checks before making

RE: Solr configuration, memory usage and MMapDirectory

2014-10-06 Thread Simon Fairey
Thanks I will have a read and digest this. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: 06 October 2014 16:56 To: solr-user@lucene.apache.org Subject: Re: Solr configuration, memory usage and MMapDirectory On 10/6/2014 9:24 AM, Simon Fairey wrote: I've

Re: ICUTokenizer or StandardTokenizer or ??? for text_all type field that might include non-whitespace langs

2014-06-20 Thread Simon Cheng
exploring this approach at the moment. Simon. On Sat, Jun 21, 2014 at 7:37 AM, T. Kuro Kurosaka k...@healthline.com wrote: On 06/20/2014 04:04 AM, Allison, Timothy B. wrote: Let's say a predominantly English document contains a Chinese sentence. If the English field uses the WhitespaceTokenizer

Tracing Files Which Have Errors

2014-06-19 Thread Simon Cheng
Hi there, I have posted 190,000 simple XML using POST.JAR and there are only 8 files that were with errors. But how do I know which are the ones have errors? Thank you in advance, Simon Cheng.

Fwd: Tracing Files Which Have Errors

2014-06-19 Thread Simon Cheng
Hi there, I have posted 190,000 simple XML using POST.JAR and there are only 8 files that were with errors. But how do I know which are the ones have errors? Thank you in advance, Simon Cheng.

Re: Export big extract from Solr to [My]SQL

2014-05-02 Thread simon
problems (and DBI takes care of writing to a database). I'm probably going to rewrite in Python since the final destination of many of our extracts is Tableau, which has a Python API for creating TDEs (Tableau data extracts) regards -Simon On Fri, May 2, 2014 at 7:43 AM, Siegfried Goeschl sgoes

Re: Duplicate Unique Key

2014-04-08 Thread Simon
MergingIndex is not the case here as I am not doing that. Even the issue is gone for now, it is not a relief for me as I am not sure how to explain this to others (peer, boss and user). I am thinking of implement a watch dog to check whenever the total Solr documents exceeds the number of items

Duplicate Unique Key

2014-04-07 Thread Simon
documents. My understanding solr uniqueKey is like a database primary key. I am wondering how could I end up with two documents with same uniqueKey in the index. Thanks, Simon -- View this message in context: http://lucene.472066.n3.nabble.com/Duplicate-Unique-Key-tp4129651.html Sent from the Solr

Re: Duplicate Unique Key

2014-04-07 Thread Simon
Erick, It's indeed quite odd. And after I trigger re-indexing all documents (via the normal process of existing program). The duplication is gone. It can not be reproduced easily. But it did occur occasionally and that makes it a frustrating task to troubleshoot. Thanks, Simon -- View

Re: Luke 4.7.0 released

2014-04-03 Thread simon
adding that worked - thanks. On Thu, Apr 3, 2014 at 4:18 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Joshua, Simon, do you pass the -XX:MaxPermSize=512m to your jvm? java -XX:MaxPermSize=512m -jar luke-with-deps.jar My java runtime environment is of the same version as Simon's: build

Re: Luke 4.7.0 released

2014-04-02 Thread simon
Also seeing this on Mac OS X. java version = Java(TM) SE Runtime Environment (build 1.7.0_51-b13) On Wed, Apr 2, 2014 at 11:01 AM, Joshua P jpetersen...@gmail.com wrote: Hi there! I'm recieving the following errors when trying to run luke-with-deps.jar SLF4J: Failed to load class

[ANNOUNCE] Apache Solr 4.7.0 released.

2014-02-26 Thread Simon Willnauer
February 2014, Apache Solr™ 4.7 available The Lucene PMC is pleased to announce the release of Apache Solr 4.7 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: Solr server requirements for 100+ million documents

2014-01-26 Thread simon
code as I am not using it). You should replace StreamingUpdateSolrServer by ConcurrentUpdateSolrServer and experiment to find the optimal number of threads to configure. -Simon On Sun, Jan 26, 2014 at 11:28 AM, Erick Erickson erickerick...@gmail.comwrote: 1 That's what I'd do. For incremental

[ANNOUNCE] Apache Solr 4.6 released.

2013-11-24 Thread Simon Willnauer
Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy Searching Simon

Solr block join

2013-10-28 Thread Simon
solutions? Thanks, Simon -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-block-join-tp4098128.html Sent from the Solr - User mailing list archive at Nabble.com.

Next official Solr release

2013-10-02 Thread Simon Zeng
Hi Solr team, I am working on a project that needs Solr 'block join' feature that currently available in 4.6 nightly build. My boss feel more comfortable with an official release like 4.4. I am wondering if there is any target release date for Solr 4.6(+) or 5.0? Thanks, Simon

[ANNOUNCE] Apache Solr 4.3 released

2013-05-06 Thread Simon Willnauer
May 2013, Apache Solr™ 4.3 available The Lucene PMC is pleased to announce the release of Apache Solr 4.3. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search,

Re: SPAN queries in solr

2012-11-23 Thread simon
take a look at SOLR-2703, which was committed for 4.0. It provides a Solr wrapper for the surround query parser, which supports span queries. On Fri, Nov 23, 2012 at 3:38 PM, Anirudha Jadhav aniru...@nyu.edu wrote: What is the best way to use span queries in solr ? I see

Re: multi-core sharing synonym map

2012-10-12 Thread simon
to it... -Simon On Fri, Oct 12, 2012 at 12:27 PM, Phil Hoy p...@friendsreunited.co.ukwrote: Hi, We have a multi-core set up with a fairly large synonym file, all cores share the same schema.xml and synonym file but when solr loads the cores, it loads multiple instances of the synonym map

Re: Installing Solr on a shared hosting server?

2012-10-10 Thread simon
some time back I used dreamhost for a Solr based project. Looks as though all their offerings, including shared hosting have Java support - see http://wiki.dreamhost.com/What_We_Support. I was very happy with their service and support. -Simon On Tue, Oct 9, 2012 at 10:44 AM, Michael Della Bitta

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-02 Thread Simon Willnauer
Robert already created and issue here: https://issues.apache.org/jira/browse/LUCENE-4279 and it seems fixed. Given the massive commit last night its already committed and backported so it will be in 4.0-BETA. simon Thanks again Saroj On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm

Re: Solr 4.0 IllegalStateException: this writer hit an OutOfMemoryError; cannot commit

2012-07-10 Thread Simon Willnauer
usage. Are you sorting / facet on anything? simon On Tue, Jul 10, 2012 at 4:49 PM, Vadim Kisselmann v.kisselm...@gmail.com wrote: Hi Robert, Can you run Lucene's checkIndex tool on your index? No, unfortunately not. This Solr should run without stoppage, an tomcat-restart is ok, but not more

Re: Multiple document types

2012-01-25 Thread Simon Willnauer
On Thu, Jan 26, 2012 at 12:05 AM, Frank DeRose fder...@guidewire.com wrote: Hi Simon, No, not different entity types, but actually different document types (I think). What would be ideal is if we could have multiple document elements in the data-config.xml file and some way of mapping each

Call for Submission Berlin Buzzwords 2012all for Submission Berlin Buzzwords - http://berlinbuzzwords.de

2012-01-11 Thread Simon Willnauer
Chairs:  *  Isabel Drost (Nokia Apache Mahout)  *  Jan Lehnardt (CouchBase Apache CouchDB)  *  Simon Willnauer (SearchWorkings Apache Lucene)  *  Grant Ingersoll (Lucid Imagination Apache Lucene)  *  Owen O’Malley (Yahoo Inc. Apache Hadoop)  *  Jim Webber (Neo Technology Neo4j)  *  Sean Treadway

Heads Up - Index File Format Change on Trunk

2012-01-05 Thread Simon Willnauer
Folks, I just committed LUCENE-3628 [1] which cuts over Norms to DocVaues. This is an index file format change and if you are using trunk you need to reindex before updating. happy indexing :) simon [1] https://issues.apache.org/jira/browse/LUCENE-3628

Re: Solr Scoring question

2012-01-05 Thread Simon Willnauer
etc. does that make sense? simon I have a JSP file that will take in parameters, do some work on them to make them appropriate for Solr, then pass the query it builds to Solr.  Should I just put more brains in that to avoid using a *:* (we're trying to verify results and we ran into this oddity

Re: spellcheck-index is rebuilt on commit

2012-01-03 Thread Simon Willnauer
any state or the version of the index since it was last called and assumes the index was just optimized. simon Thanks Oliver -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-index-is-rebuilt-on-commit-tp3626492p3628423.html Sent from the Solr - User mailing

Re: spellcheck-index is rebuilt on commit

2012-01-02 Thread Simon Willnauer
if there is a single segment in the index and rebuilds the index. if this is the case, I think this is a bug... can you open a jira ticket? simon On Mon, Jan 2, 2012 at 8:36 PM, OliverS oliver.schi...@unibas.ch wrote: Hi Looks like they strip the raw-Text for the list. Whole message here: http

Re: Matching all documents in the index

2011-12-13 Thread Simon Willnauer
try *:* instead of *.* simon On Tue, Dec 13, 2011 at 5:03 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I have come across this query in the admin interface: *.* Is this meant to match all documents in my index? Currently when i run query with q= *.*, numFound is 130310 but the actuall

Re: Integrating Surround Query Parser

2011-12-02 Thread simon
of Lucene I'm not sure how easily this would all would backport to Solr 3.1, but you could try best -Simon On Tue, Nov 22, 2011 at 1:05 AM, Rahul Mehta rahul23134...@gmail.comwrote: Hello, I want to Run surround query . 1. Downloading from http://www.java2s.com/Code/Jar/JKL

Re: Integrating Surround Query Parser

2011-12-02 Thread simon
oops, didn't see all of the thread before I hit send. Good work, Erik On Fri, Dec 2, 2011 at 5:21 PM, simon mtnes...@gmail.com wrote: Take a look at https://issues.apache.org/jira/browse/SOLR-2703, which integrates the surround parser into Solr trunk. There's a dependency on a Lucene patch

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Simon Willnauer
I wonder if you have a explicitly configured merge policy? In Solr 1.4 ie. Lucene 2.9 LogMergePolicy was the default but in 3.5 TieredMergePolicy is used by default. This could explain the differences segment wise since from what I understand you are indexing the same data on 1.4 and 3.5? simon

Re: Seek past EOF

2011-11-30 Thread Simon Willnauer
can you give us some details about what filesystem you are using? simon On Wed, Nov 30, 2011 at 3:07 PM, Ruben Chadien ruben.chad...@aspiro.com wrote: Happened again…. I got 3 directories in my index dir 4096 Nov  4 09:31 index.2004083156 4096 Nov 21 10:04 index.2021090440 4096

[ANNOUNCE] Apache Solr 3.5 released

2011-11-26 Thread Simon Willnauer
27 November 2011, Apache Solr™ 3.5.0 available The Lucene PMC is pleased to announce the release of Apache Solr 3.5.0. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting,

JVM Bugs affecting Lucene Solr

2011-11-15 Thread Simon Willnauer
on an older JVM you could be affected by this bug and should either upgrade to a new JVM or use -XX:+UseMembar to start you JVM. In general its a good idea to keep an eye on http://wiki.apache.org/lucene-java/SunJavaBugs we try to keep this up-to-date thanks, Simon

Re: changing omitNorms on an already built index

2011-10-28 Thread Simon Willnauer
On Fri, Oct 28, 2011 at 12:20 AM, Robert Muir rcm...@gmail.com wrote: On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer simon.willna...@googlemail.com wrote: we are not actively removing norms. if you set omitNorms=true and index documents they won't have norms for this field. Yet, other

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Simon Willnauer
better and usually continuous IO utilization. hope that helps. simon pool-2-thread-1 [RUNNABLE] CPU time: 3:31 java.nio.Bits.copyToByteArray(long, Object, long, long) java.nio.DirectByteBuffer.get(byte[], int, int) org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(byte[], int, int

Re: large scale indexing issues / single threaded bottleneck

2011-10-28 Thread Simon Willnauer
On Fri, Oct 28, 2011 at 9:17 PM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Roman, On Fri, Oct 28, 2011 at 8:38 PM, Roman Alekseenkov ralekseen...@gmail.com wrote: Hi everyone, I'm looking for some help with Solr indexing issues on a large scale. We are indexing few

Re: How can I force the threshold for a fuzzy query?

2011-10-27 Thread Simon Willnauer
simon On Thu, Oct 27, 2011 at 4:54 PM, Gustavo Falco comfortablynum...@gmail.com wrote: Hi guys, I'm new to Solr (as you may guess for the subject). I'd like to force the threshold for fuzzy queries to, say, 0.7. I've read that fuzzy queries are expensive, but limiting it's threshold

Re: changing omitNorms on an already built index

2011-10-27 Thread Simon Willnauer
it will be true for other segment eventually. If you optimize you index you should see that norms go away. simon On Thu, Oct 27, 2011 at 11:17 PM, Marc Sturlese marc.sturl...@gmail.com wrote: As far as I know there's no issue about this. You have to reindex and that's it. In which kind of field

Re: accessing the query string from inside TokenFilter

2011-10-25 Thread Simon Willnauer
you bring this to the dev list? simon Regards Bernd

Re: some basic information on Solr

2011-10-25 Thread Simon Willnauer
of document formats (http://tika.apache.org/0.10/formats.html). Hope this helps here?! 2. How much is estimated cost of incidents per year for Solr ? I have to admit I don't know what you are asking for. can you elaborate on this a bit? What is an incident in this context? simon Since the numbers

Re: Optimization /Commit memory

2011-10-25 Thread Simon Willnauer
the segment you have in memory (IndexWriter memory) to disk. compression ratio can be up to 30% of the ram cost or even more depending on your data. The actual commit doesn't need a notable amount of memory. hope this helps simon On Mon, Oct 24, 2011 at 7:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov

Re: How to make UnInvertedField faster?

2011-10-22 Thread Simon Willnauer
limitation here. simon Hopefully we can fix that at some point :) Mike McCandless http://blog.mikemccandless.com On Fri, Oct 21, 2011 at 7:50 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In trunk we have a feature called IndexDocValues which basically creates the uninverted

  1   2   3   >