RE: One item, multiple fields, and range queries

2011-04-08 Thread wojtekpia
Hi Hoss, I realize I'm reviving a really old thread, but I have the same need, and SpanNumericRangeQuery sounds like a good solution for me. Can you give me some guidance on how to implement that? Thanks, Wojtek -- View this message in context:

Re: SEVERE: Unable to move index file

2010-09-30 Thread wojtekpia
Hi, I ran into this problem again the other night. I've looked through my log files in more detail, and nothing seems out of place (I stripped user queries out and included it below). I have the following setup: 1. Indexer has 2 cores. One core gets incremental updates, the other is for full

Re: performance sorting multivalued field

2010-06-24 Thread wojtekpia
Chris Hostetter-3 wrote: sorting on a multivalued is defined to have un-specified behavior. it might fail with an error, or it might fail silently. I learned this the hard way, it failed silently for a long time until it failed with an error:

Re: DataImportHandler and running out of disk space

2010-06-03 Thread wojtekpia
https://issues.apache.org/jira/browse/SOLR-1939 SOLR-1939 created. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-and-running-out-of-disk-space-tp835125p868133.html Sent from the Solr - User mailing list archive at Nabble.com.

DataImportHandler and running out of disk space

2010-05-21 Thread wojtekpia
I'm noticing some data differences between my database and Solr. About a week ago my Solr server ran out of disk space, so now I'm observing how the DataImportHandler behaves when Solr runs out of disk space. In a word, I'd say it behaves badly! It looks like out-of-disk-space exceptions are

Re: DataImportHandler and running out of disk space

2010-05-21 Thread wojtekpia
I ran through some more failure scenarios (scenarios and results below). The concerning ones in my deployment are when data does not get updated, but the DIH's .properties file does. I could only simulate that scenario when I ran out of disk space (all all disk space issues behaved consistently).

SEVERE: Unable to move index file

2010-05-12 Thread wojtekpia
Hi I ran into a replication issue yesterday and I have no explanation for it. I see the following in my logs: SEVERE: Unable to move index file from: /my/dir/Solr/data/property/index.20100511050029/_3zj.fdt to: /my/dir/Solr/data/property/index.20100511042539/_3zj.fdt I restarted the subscriber

Re: Sanity check on numeric types and which of them to use

2010-05-07 Thread wojtekpia
3) The only reason to use a sint field is for backward compatibility and/or to use sortMissingFirst/SortMissingLast, correct? I'm using sint so I can facet and sort facets numerically. -- View this message in context:

Discovering Slaves

2010-02-15 Thread wojtekpia
Is there a way to 'discover' slaves using ReplicationHandler? I'm writing a quick dashboard, and don't have access to a list of slaves, but would like to show some stats about their health. -- View this message in context: http://old.nabble.com/Discovering-Slaves-tp27601334p27601334.html Sent

Re: Google Commerce Search

2010-01-19 Thread wojtekpia
While Solr is functionally platform independent, I have seen much better performance on Linux than Windows under high load (related to SOLR-465). MitchK wrote: As you know, Solr is fully written in Java and Java is still plattform-independent. ;) Learn more about Solr on

Dynamically change config file name in DataImportHandler

2010-01-14 Thread wojtekpia
I have 2 data import files, and I'd like to be able to switch between without renaming either file, and without changing solrconfig.xml. Does the DataImportHandler support that? I tried passing a 'config' parameter with the 'reload-config' command, but that didn't work. Thanks, Wojtek -- View

Re: Dynamically change config file name in DataImportHandler

2010-01-14 Thread wojtekpia
I thought of another way: have two data import request handlers configured in solrconfig.xml, one for each file. wojtekpia wrote: I have 2 data import files, and I'd like to be able to switch between without renaming either file, and without changing solrconfig.xml. Does

Re: question about schemas (and SOLR-1131?)

2009-12-04 Thread wojtekpia
Could this be solved with a multi-valued custom field type (including a custom comparator)? The OP's situation deals with multi-valuing products for each customer. If products contain strictly numeric fields then it seems like a custom field implementation (or extension of BinaryField?) *should*

Re: javabin in .NET?

2009-11-12 Thread wojtekpia
I was thinking of going this route too because I've found that parsing XML result sets using XmlDocument + XPath can be very slow (up to a few seconds) when requesting ~100 documents. Are you getting good performance parsing large result sets? Are you using SAX instead of DOM? Thanks, Wojtek

Re: number of Solr indexes per Tomcat instance

2009-10-23 Thread wojtekpia
I ran into trouble running several cores (either as Solr multi-core or as separate web apps) in a single JVM because the Java garbage collector would freeze all cores during a collection. This may not be an issue if you're not dealing with large amounts of memory. My solution is to run each web

Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-16 Thread wojtekpia
I'm seeing the same behavior and I don't have any custom query parsing plugins. Similar to the original post, my queries like: select?q=field:[1 TO *] select?q=field:[1 TO 2] select?q=field:[1 TO 2]debugQuery=true work correctly, but including an unboundd range appears to break the debug

Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-16 Thread wojtekpia
http://www.lucidimagination.com On Fri, Oct 16, 2009 at 3:01 PM, wojtekpia wojte...@hotmail.com wrote: I'm seeing the same behavior and I don't have any custom query parsing plugins. Similar to the original post, my queries like: select?q=field:[1 TO *] select?q=field:[1 TO 2] select?q

Different sort behavior on same code

2009-10-06 Thread wojtekpia
Hi, I'm running Solr version 1.3.0.2009.07.08.08.05.45 in 2 environments. I have a field defined as: field name=myDate type=date indexed=true stored=true multiValued=true/ The two environments have different data, but both have single and multi valued entries for myDate. On one environment

Multi-valued field cache

2009-09-30 Thread wojtekpia
I want to build a FunctionQuery that scores documents based on a multi-valued field. My intention was to use the field cache, but that doesn't get me multiple values per document. I saw other posts suggesting UnInvertedField as the solution. I don't see a method in the UnInvertedField class that

Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia
Fergus McMenemie-2 wrote: Can you provide more detail on what you are trying to do? ... You seem to listing all files d:\my\directory\.*WRK. Do these WRK files contain lists of files to be indexed? That is my complete data config file. I have a directory containing a bunch of files

Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia
Note that if I change my import file to explicitly list all my files (instead of using the FileListEntityProcessor) as below then everything works as I expect. dataSource type=FileDataSource name=fileDataSource basePath=d:\my\directory\/ document name=dict-entries entity name=jc

Re: Backups using Replication

2009-09-11 Thread wojtekpia
Do you mean that it's been renamed, so this should work? requestHandler name=/replication class=solr.ReplicationHandler lst name=master ... str name=snapshotoptimize/str ... /lst /requestHandler Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: before that backupAfter was called

Re: Backups using Replication

2009-09-11 Thread wojtekpia
I've verified that renaming backAfter to snapshot works (I should've checked before asking). Thanks Noble! wojtekpia wrote: requestHandler name=/replication class=solr.ReplicationHandler lst name=master ... str name=snapshotoptimize/str ... /lst /requestHandler

Re: Backups using Replication

2009-09-10 Thread wojtekpia
I'm using trunk from July 8, 2009. Do you know if it's more recent than that? Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: which version of Solr are you using? the backupAfter name was introduced recently -- View this message in context:

Re: Passing FuntionQuery string parameters

2009-09-10 Thread wojtekpia
It looks like parseArg was added on Aug 20, 2009. I'm working with slightly older code. Thanks! Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: did you implement your own ValueSourceParser . the FunctionQParser#parseArg() method supports strings On Wed, Sep 9, 2009 at 12:10 AM,

Backups using Replication

2009-09-08 Thread wojtekpia
I'm trying to create data backups using the ReplicationHandler's built in functionality. I've configured my master as http://wiki.apache.org/solr/SolrReplication documented : requestHandler name=/replication class=solr.ReplicationHandler lst name=master ... str

Passing FuntionQuery string parameters

2009-09-08 Thread wojtekpia
Hi, I'm writing a function query to score documents based on Levenshtein distance from a string. I want my function calls to look like: lev(myFieldName, 'my string to match') I'm running into trouble parsing the string I want to match ('my string to match' above). It looks like all the built

RE: Searching and Displaying Different Logical Entities

2009-08-27 Thread wojtekpia
Funtick wrote: then 2) get all P's by ID, including facet counts, etc. The problem I face with this solution is that I can have many matching P's (10,000+), so my second query will have many (10,000+) constraints. SOLR can automatically provide you P's with Counts, and it will be

Searching and Displaying Different Logical Entities

2009-08-26 Thread wojtekpia
I'm trying to figure out if Solr is the right solution for a problem I'm facing. I have 2 data entities: P(arent) C(hild). P contains up to 100 instances of C. I need to expose an interface that searches attributes of entity C, but displays them grouped by parent entity, P. I need to include

Re: Facets with an IDF concept

2009-08-13 Thread wojtekpia
Hi Asif, Did you end up implementing this as a custom sort order for facets? I'm facing a similar problem, but not related to time. Given 2 terms: A: appears twice in half the search results B: appears once in every search result I think term A is more interesting. Using facets sorted by

Solr CMS Integration

2009-08-07 Thread wojtekpia
I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with

Re: Solr CMS Integration

2009-08-07 Thread wojtekpia
Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do the trick, and if it doesn't then at least I'll know what I'm looking for. Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html Sent from the Solr - User mailing

Re: Dedicated Slave Master

2009-07-16 Thread wojtekpia
Hey Grant, It's a middleman, not a backup. We don't have any issues in the current setup, just trying to make sure we have a solution in case this becomes an issue. I'm concerned about a situation with dozens of searchers. The i/o and network load on the indexer might become significant at that

Dedicated Slave Master

2009-07-15 Thread wojtekpia
I'm building a high load system that will require several search slaves (at least 2, but this may grow to 5-10+ in the near future). I plan to have a single indexer that replicates to the search slaves. I want indexing to be as fast as possible, so I've considered adding another machine between

Solr vs Sphinx

2009-05-13 Thread wojtekpia
I came across this article praising Sphinx: http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article specifically mentions Solr as an 'aging' technology, and states that performance on Sphinx is 2x-4x faster than Solr. Has anyone compared Sphinx to Solr? Or used Sphinx in the past? I

Re: preImportDeleteQuery

2009-05-08 Thread wojtekpia
I'm using full-import, not delta-import. I tried it with delta-import, and it would work, except that I'm querying for a large number of documents so I can't afford the cost of deltaImportQuery for each document. It sounds like $deleteDocId will work. I just need to update from 1.3 to trunk.

Re: JVM exception_access_violation

2009-05-08 Thread wojtekpia
I updated to Java 6 update 13 and have been running problem free for just over a month. I'll continue this thread if I run into any problems that seem to be related. Yonik Seeley-2 wrote: I assume that you're not using any Tomcat native libs? If you are, try removing them... if not (and

Sorting by 'starts with'

2009-05-07 Thread wojtekpia
I have an index of product names. I'd like to sort results so that entries starting with the user query come first. E.g. q=kitchen Results would sort something like: 1. kitchen appliance 2. kitchenaid dishwasher 3. fridge for kitchen It looks like using a query Function Query comes close,

preImportDeleteQuery

2009-05-07 Thread wojtekpia
Hi, I'm importing data using the DIH. I manage all my data updates outside of Solr, so I use the full-import command to update my index (with clean=false). Everything works fine, except that I can't delete documents easily using the DIH. I noticed the preImportDeleteQuery attribute, but doesn't

JVM exception_access_violation

2009-03-20 Thread wojtekpia
I'm running Solr on Tomcat 6.0.18 with Java 6 update 7 on Windows 2003 64 bit. Over the past month or so, my JVM has crashed twice with the error below. Has anyone experienced this? My system is not heavily loaded, and the crash seems to coincide with an update (via DIH). I'm running trunk code

Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia
Is there a recommended unix flavor for deploying Solr on? I've benchmarked my deployment on Red Hat. Our operations team asked if we can use FreeBSD instead. Assuming that my benchmark numbers are consistent on FreeBSD, is there anything else I should watch out for? Thanks. Wojtek -- View

Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia
Thanks Otis. Do you know what the most common deployment OS is? I couldn't find much on the mailing list or http://wiki.apache.org/solr/PublicServers Otis Gospodnetic wrote: You should be fine on either Linux or FreeBSD (or any other UNIX flavour). Running on Solaris would probably give

Re: Reading Core-Specific Config File in a Row Transformer

2009-02-18 Thread wojtekpia
Thanks Shalin. I think you missed the call to .getResourceLoader(), so it should be: context.getSolrCore().getResourceLoader().getInstanceDir() Works great, thanks! Shalin Shekhar Mangar wrote: You can use Context.getSolrCore().getInstanceDir() -- View this message in context:

Reading Core-Specific Config File in a Row Transformer

2009-02-17 Thread wojtekpia
I'm using the DataImportHandler to load data. I created a custom row transformer, and inside of it I'm reading a configuration file. I am using the system's solr.solr.home property to figure out which directory the file should be in. That works for a single-core deployment, but not for multi-core

Re: Recent Paging Change?

2009-02-11 Thread wojtekpia
I'll run a profiler on new and old code and let you know what I find. I have changed my schema between tests: I used to have termVectors turned on for several fields, and now they are always off. My underlying data has not changed. -- View this message in context:

Re: Performance degradation caused by choice of range fields

2009-02-11 Thread wojtekpia
Yes, I commit roughly every 15 minutes (via a data update). This update is consistent between my tests, and only causes a performance drop when I'm sorting on fields with many unique values. I've examined my GC logs, and they are also consistent between my tests. Otis Gospodnetic wrote: Hi,

Re: Recent Paging Change?

2009-02-11 Thread wojtekpia
This was a false alarm, sorry. I misinterpreted some results. wojtekpia wrote: Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm seeing much worse performance (75% drop in throughput) when I request 20 records starting at record 180 (page 10 in my application

Recent Paging Change?

2009-02-10 Thread wojtekpia
Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm seeing much worse performance (75% drop in throughput) when I request 20 records starting at record 180 (page 10 in my application). Thanks. Wojtek -- View this message in context:

Re: Performance dead-zone due to garbage collection

2009-02-09 Thread wojtekpia
performance requirements, so I'm able to accept that. Thanks for the tips! Wojtek yonik wrote: On Tue, Feb 3, 2009 at 11:58 AM, wojtekpia wojte...@hotmail.com wrote: I noticed your wiki post about sorting with a function query instead of the Lucene sort mechanism. Did you see

Re: Performance dead-zone due to garbage collection

2009-02-09 Thread wojtekpia
I tried sorting using a function query instead of the Lucene sort and found no change in performance. I wonder if Lance's results are related to something specific to his deployment? -- View this message in context:

Performance degradation caused by choice of range fields

2009-02-09 Thread wojtekpia
In my schema I have two copies of my numeric fields: one with the original value (used for display, sort), and one with a rounded version of the original value (used for range queries). When I use my rounded field for numeric range queries (e.g. q=RoundedValue:[100 TO 1000]), I see very

Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Is an easy way to choose/create an alternate sorting algorithm? I'm frequently dealing with large result sets (a few million results) and I might be able to benefit domain knowledge in my sort. -- View this message in context:

Queued Requests during GC

2009-02-04 Thread wojtekpia
During full garbage collection, Solr doesn't acknowledge incoming requests. Any requests that were received during the GC are timestamped the moment GC finishes (at least that's what my logs show). Is there a limit to how many requests can queue up during a full GC? This doesn't seem like a Solr

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
That's not quite what I meant. I'm not looking for a custom comparator, I'm looking for a custom sorting algorithm. Is there a way to use quick sort or merge sort or... rather than the current algorithm? Also, what is the current algorithm? Otis Gospodnetic wrote: You can use one of the

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Ok, so maybe a better question is: should I bother trying to change the sorting algorithm? I'm concerned that with large data sets, sorting becomes a severe bottleneck (this is an assumption, I haven't profiled anything to verify). Does it become a severe bottleneck? Do you know if alternate sort

Re: Performance dead-zone due to garbage collection

2009-02-03 Thread wojtekpia
I noticed your wiki post about sorting with a function query instead of the Lucene sort mechanism. Did you see a significantly reduced memory footprint by doing this? Did you reduce the number of fields you allowed users to sort by? Lance Norskog-2 wrote: Sorting creates a large array with

RE: Performance dead-zone due to garbage collection

2009-01-30 Thread wojtekpia
I profiled our application, and GC is definitely the problem. The IBM JVM didn't change much. I'm currently looking into ways of reducing my memory footprint. -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21758001.html

Solr on Sun Java Real-Time System

2009-01-30 Thread wojtekpia
Has anyone tried Solr on the Sun Java Real-Time JVM (http://java.sun.com/javase/technologies/realtime/index.jsp)? I've read that it includes better control over the garbage collector. Thanks. Wojtek -- View this message in context:

Re: Intermittent high response times

2009-01-22 Thread wojtekpia
I'm experiencing similar issues. Mine seem to be related to old generation garbage collection. Can you monitor your garbage collection activity? (I'm using JConsole to monitor it: http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html). In my system, garbage collection usually

Re: Performance dead-zone due to garbage collection

2009-01-22 Thread wojtekpia
I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside from setting my JRE paths, is there anything else I need to do run inside the IBM JVM? (e.g. re-compiling?) Walter Underwood wrote: What JVM and garbage collector setting? We are using the IBM JVM with their concurrent

Re: Performance Hit for Zero Record Dataimport

2009-01-21 Thread wojtekpia
Thanks Shalin, a short circuit would definitely solve it. Should I open a JIRA issue? Shalin Shekhar Mangar wrote: I guess Data Import Handler still calls commit even if there were no documents created. We can add a short circuit in the code to make sure that does not happen. -- View

Performance dead-zone due to garbage collection

2009-01-21 Thread wojtekpia
I'm intermittently experiencing severe performance drops due to Java garbage collection. I'm allocating a lot of RAM to my Java process (27GB of the 32GB physically available). Under heavy load, the performance drops approximately every 10 minutes, and the drop lasts for 30-40 seconds. This

Re: Performance Hit for Zero Record Dataimport

2009-01-21 Thread wojtekpia
Created SOLR 974: https://issues.apache.org/jira/browse/SOLR-974 -- View this message in context: http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21588634.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance dead-zone due to garbage collection

2009-01-21 Thread wojtekpia
I'm using a recent version of Sun's JVM (6 update 7) and am using the concurrent generational collector. I've tried several other collectors, none seemed to help the situation. I've tried reducing my heap allocation. The search performance got worse as I reduced the heap. I didn't monitor the

Re: Performance dead-zone due to garbage collection

2009-01-21 Thread wojtekpia
(Thanks for the responses) My filterCache hit rate is ~60% (so I'll try making it bigger), and I am CPU bound. How do I measure the size of my per-request garbage? Is it (total heap size before collection - total heap size after collection) / # of requests to cause a collection? I'll try your

Performance Hit for Zero Record Dataimport

2009-01-20 Thread wojtekpia
I have a transient SQL table that I use to load data into Solr using the DataImportHandler. I run an update every 15 minutes (dataimport?command=full-importclean=falseoptimize=false), but my table will frequently have no new data for me to import. When the table contains no data, it looks like

Overlapping Replication Scripts

2009-01-08 Thread wojtekpia
I have set up cron jobs that update my index every 15 minutes. I have a distributed setup, so the steps are: 1. Update index on indexer machine (and possibly optimize) 2. Invoke snapshooter on indexer 3. Invoke snappuller on searcher 4. Invoke snapinstaller on searcher. These updates are small,

Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
I'm running load tests against my Solr instance. I find that it typically takes ~10 minutes for my Solr setup to warm-up while I throw my test queries at it. Also, I have the same two warm-up queries specified for the firstSearcher and newSearcher event listeners. I'm now benchmarking the

RE: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
Sorry, I forgot to include that. All my autowarmcount's are set to 0. Feak, Todd wrote: First suspect would be Filter Cache settings and Query Cache settings. If they are auto-warming at all, then there is a definite difference between the first start behavior and the post-commit

Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
I use my warm up queries to fill the field cache (or at least that's the idea). My filterCache hit rate is ~99% queryResultCache is ~65%. I update my index several times a day with no 'optimize', and performance is seemless. I also update my index once nightly with an 'optimize', and that's

Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
I'm optimizing because I thought I should. I'll be updating my index somewhere between every 15 minutes, and every 2 hours. That means between 12 and 96 updates per day. That seems like a lot of index files (and it scared me a little), so that's my second reason for wanting to optimize nightly.

Re: new faceting algorithm

2008-12-12 Thread wojtekpia
It looks like my filterCache was too big. I reduced my filterCache size from 700,000 to 20,000 (without changing the heap size) and all my performance issues went away. I experimented with various GC settings, but none of them made a significant difference. I see a 16% increase in throughput by

Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia
I've seen some strangle results in the last few days of testing, but this one flies in the face of everything I've read on this forum: Reducing filterCache size has increased performance. I have posted my setup here: http://www.nabble.com/Throughput-Optimization-td20335132.html. My original

Re: Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia
Reducing the amount of memory given to java slowed down Solr at first, then quickly caused the garbage collector to behave badly (same issue as I referenced above). I am using the concurrent cache for all my caches. -- View this message in context:

Re: Throughput Optimization

2008-12-04 Thread wojtekpia
It looks like file locking was the bottleneck - CPU usage is up to ~98% (from the previous peak of ~50%). I'm running the trunk code from Dec 2 with the faceting improvement (SOLR-475) turned off. Thanks for all the help! Yonik Seeley wrote: FYI, SOLR-465 has been committed. Let us know if

Re: new faceting algorithm

2008-12-04 Thread wojtekpia
my deployment scenario in an earlier post: http://www.nabble.com/Throughput-Optimization-td20335132.html Does it sound like the new faceting algorithm could be the culprit? wojtekpia wrote: Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday

Re: Throughput Optimization

2008-12-04 Thread wojtekpia
New faceting stuff off because I'm encountering some problems when I turn it on, I posted the details: http://www.nabble.com/new-faceting-algorithm-td20674902.html#a20840622 Yonik Seeley wrote: On Thu, Dec 4, 2008 at 1:54 PM, wojtekpia [EMAIL PROTECTED] wrote: It looks like file locking

Re: new faceting algorithm

2008-12-04 Thread wojtekpia
Yonik Seeley wrote: Are you doing commits at any time? One possibility is the caching mechanism (weak-ref on the IndexReader)... that's going to be changing soon hopefully. -Yonik No commits during this test. Should I start looking into my heap size distribution and garbage

Re: NIO not working yet

2008-12-04 Thread wojtekpia
I've updated my deployment to use NIOFSDirectory. Now I'd like to confirm some previous results with the original FSDirectory. Can I turn it off with a parameter? I tried: java -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.FSDirectory ... but that didn't work. -- View this

Re: new faceting algorithm

2008-12-02 Thread wojtekpia
Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Yonik Seeley wrote: And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM

Re: new faceting algorithm

2008-12-02 Thread wojtekpia
Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള്‍ नोब्ळ् wrote: wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world -- View

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
the faceting you're doing. Erik On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: I've been running load tests over the past week or 2, and I can't figure out my system's bottle neck that prevents me from increasing throughput. First I'll describe my Solr setup, then what I've

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
Where is the alt directory in the source tree (or what is the JIRA issue number)? I'd like to apply this patch and re-run my tests. Does changing the lockType in solrconfig.xml address this issue? (My lockType is the default - single). markrmiller wrote: The latest alt directory patch uses

RE: Throughput Optimization

2008-11-05 Thread wojtekpia
My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using FastLRUCache on all 3 of the caches. Feak, Todd wrote: What are your other cache hit rates looking like? Which caches are you using the FastLRUCache on? -Todd Feak -Original Message- From: wojtekpia

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
I'd like to integrate this improvement into my deployment. Is it just a matter of getting the latest Lucene jars (Lucene nightly build)? Yonik Seeley wrote: You're probably hitting some contention with the locking around the reading of index files... this has been recently improved in

Throughput Optimization

2008-11-04 Thread wojtekpia
I've been running load tests over the past week or 2, and I can't figure out my system's bottle neck that prevents me from increasing throughput. First I'll describe my Solr setup, then what I've tried to optimize the system. I have 10 million records and 59 fields (all are indexed, 37 are

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
Make sure the fields you're trying to highlight are stored in your schema (e.g. field name=synopsis type=string stored=true /) David Snelling-2 wrote: Ok, I'm very frustrated. I've tried every configuraiton I can and parameters and I cannot get fragments to show up in the highlighting in

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
=shortdescription type=string indexed=true stored=true/ field name=synopsis type=string indexed=true stored=true compressed=true/ On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia [EMAIL PROTECTED] wrote: Make sure the fields you're trying to highlight are stored in your schema (e.g. field name

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
at 2:20 PM, wojtekpia [EMAIL PROTECTED] wrote: Try a query where you're sure to get something to highlight in one of your highlight fields, for example: /select/?qt=standardq=synopsis:crayonhl=truehl.fl=synopsis,shortdescription David Snelling-2 wrote: This is the configuration

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
? Thank you very much for the help by the way. On Tue, Sep 23, 2008 at 2:49 PM, wojtekpia [EMAIL PROTECTED] wrote: Your fields are all of string type. String fields aren't tokenized or analyzed, so you have to match the entire text of those fields to actually get a match. Try the following

Re: dataimporter.last_index_time not set for full-import query

2008-09-10 Thread wojtekpia
I created a JIRA issue for this and attached a patch: https://issues.apache.org/jira/browse/SOLR-768 wojtekpia wrote: I would like to use (abuse?) the dataimporter.last_index_time variable in my full-import query, but it looks like that variable is only set when running a delta-import

Re: Faceting MoreLikeThisComponent results

2008-09-08 Thread wojtekpia
Thanks Hoss. I created SOLR 760: https://issues.apache.org/jira/browse/SOLR-760 hossman wrote: : When using the MoreLikeThisHandler with facets turned on, the facets show : counts of things that are more like my original document. When I use the : MoreLikeThisComponent, the facets show

Creating dynamic fields with DataImportHandler

2008-08-29 Thread wojtekpia
I have a custom row transformer that I'm using with the DataImportHandler. When I try to create a dynamic field from my transformer, it doesn't get created. If I do exactly the same thing from my dataimport handler config file, it works as expected. Has anyone experienced this? I'm using a

Re: Creating dynamic fields with DataImportHandler

2008-08-29 Thread wojtekpia
it beforehand) to your data config and use the Transformer to set the value. If you don't know the field name before hand then this will not work for you. On Sat, Aug 30, 2008 at 1:31 AM, wojtekpia [EMAIL PROTECTED] wrote: I have a custom row transformer that I'm using

Faceting MoreLikeThisComponent results

2008-08-28 Thread wojtekpia
When using the MoreLikeThisHandler with facets turned on, the facets show counts of things that are more like my original document. When I use the MoreLikeThisComponent, the facets show counts of things that match my original document (I'm querying by document ID), so there is only one result,

Duplicate Data Across Fields

2008-08-14 Thread wojtekpia
I have 2 fields which will sometimes contain the same data. When they do contain the same data, am I paying the same performance cost as when they contain unique data? I think the real question here is: does Lucene index values per field, or per document? -- View this message in context:

Return results for suggested SpellCheck terms

2008-08-08 Thread wojtekpia
I'd like to have a handler that 1) executes a query, 2) provides spelling suggestions for incorrectly spelled words, and 3) if the original query returns 0 results, return results based on the spell check suggestions. 1 2 are straight forward using the SpellCheckComponent, but I can't figure

DataImportHandler current_index_time post-completion action

2008-07-16 Thread wojtekpia
I have two questions: 1. I am pulling data from 2 data sources using the DIH. I am using the deltaQuery functionality. Since the data sources pull data sequentially, I find that some data is getting unnecessarily re-indexed from my second data source. Hopefully this helps illustrate my probem:

Re: Similarity of numbers in MoreLikeThisHandler

2008-07-04 Thread wojtekpia
I stored 2 copies of a single field: one as a number, the other as a string. The MLT handler returned the same documents regardless of which of the 2 fields I used for similarity. So to answer my own question, the MoreLikeThisHandler does not do numeric comparisons on numeric fields. -- View

Re: Similarity of numbers in MoreLikeThisHandler

2008-07-04 Thread wojtekpia
I didn't realize that subsets were used to evaluate similarity. From your example, I assume that the strings: 456 and 123456 are similar. If I store them as integers instead of strings, will Solr/Lucene still use subsets to assign similarity? -- View this message in context:

  1   2   >