Re: including external files in config by corename

2010-04-06 Thread Shawn Heisey
On 4/5/2010 8:43 PM, Mark Miller wrote: On 04/05/2010 10:12 PM, Chris Hostetter wrote: : The best you have to work with at the moment is Xincludes: : : http://wiki.apache.org/solr/SolrConfigXml#XInclude : : and System Property Substitution: : :

Re: including external files in config by corename

2010-04-07 Thread Shawn Heisey
On 4/5/2010 8:43 PM, Mark Miller wrote: On 04/05/2010 10:12 PM, Chris Hostetter wrote: : The best you have to work with at the moment is Xincludes: : : http://wiki.apache.org/solr/SolrConfigXml#XInclude : : and System Property Substitution: : :

Re: including external files in config by corename

2010-04-07 Thread Shawn Heisey
On 4/5/2010 8:12 PM, Chris Hostetter wrote: what you cna do however, is have a distinct solrconfig.xml for each core, which is just a thin shell that uses XInclude to include big chunkcs of frequently reused declarations, and some cores can exclude some of thes includes. (ie: turn the problem

Re: including external files in config by corename

2010-04-07 Thread Shawn Heisey
On 4/7/2010 9:16 AM, Shawn Heisey wrote: On 4/5/2010 8:12 PM, Chris Hostetter wrote: what you cna do however, is have a distinct solrconfig.xml for each core, which is just a thin shell that uses XInclude to include big chunkcs of frequently reused declarations, and some cores can exclude some

Re: Solr DataImportHandler

2010-04-08 Thread Shawn Heisey
On 4/8/2010 2:11 AM, Mark N wrote: Is it possible to use solr DataImportHandler when that database fields are not fixed ? As per my findings we need to configure which table ( entity) we will read the data and must match which fields in database will map to fields in solr schema Since in my

Re: Solr DataImportHandler

2010-04-08 Thread Shawn Heisey
On 4/8/2010 7:05 AM, Shawn Heisey wrote: Here's what I'm using as the query in my latest config: Actually, that was three separate queries: query=SELECT * FROM ${dataimporter.request.dataTable} WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid

Re: Is there any other tool other than DIH to index a database

2010-04-08 Thread Shawn Heisey
On 4/7/2010 9:26 PM, bbarani wrote: Hi, I am currently using DIH to index the data from a database. I am just trying to figure out if there are any other open source tools which I can use just for indexing purpose and use SOLR for querying. I also thought of writing a custom code for

Re: including external files in config by corename

2010-04-09 Thread Shawn Heisey
On 4/8/2010 1:15 PM, Chris Hostetter wrote: ...i suspect you want something like... xi:include href=handlers.xml xpointer=//requestHandler / where handlers.xml looks like... anyThingYouWant requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/update/javabin

Solr date NOW - format?

2010-04-09 Thread Shawn Heisey
I've been trying to work out how SOLR thinks about dates internally so I can boost newer documents. My post_date field is stored as seconds since the epoch, so I think the following is probably what I want. I used 3.17 instead of the 3.16 in all the examples because my own math suggests

Re: Solr date NOW - format?

2010-04-11 Thread Shawn Heisey
On 4/9/2010 7:35 PM, Lance Norskog wrote: The example function seems to round time to years, so you're boosting by year? Your dates are stored as UTC 64-bit longs counting the number of milliseconds since Jan 1, 1970. That's it. They're in milliseconds whether you supplied them that way or not.

Using dismax with shards specified in solrconfig.xml

2010-04-11 Thread Shawn Heisey
I am using a setup where I have specified the shards parameter in a broker called main, which then queries a bunch of other machines including the one it's on, using the core named live. requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults bool

Re: Using dismax with shards specified in solrconfig.xml

2010-04-11 Thread Shawn Heisey
Adding it to the main core looks like it works, without the dismax handler even present in the live core config. It won't take the bf value that I described, though. str name=bfrecip(ms(NOW,product(post_date,1000)),3.17e-11,1,1)/str This spits an error: Problem accessing /solr/main/select.

Re: Benchmarking Solr

2010-04-12 Thread Shawn Heisey
I've got a very simple perl script (most of the work is done with modules) that I wrote which forks off multiple processes and throws requests at Solr, then gives a little bit of statistical analysis at the end. I have planned on sharing it from the beginning, I just have to clean it up for

Re: Benchmarking Solr

2010-04-12 Thread Shawn Heisey
On 4/12/2010 8:51 AM, Paolo Castagna wrote: There are already two related pages: - http://wiki.apache.org/solr/SolrPerformanceFactors - http://wiki.apache.org/solr/SolrPerformanceData Why not to create a new page? - http://wiki.apache.org/solr/BenchmarkingSolr (?) Done. I hope you like

dismax and date boosts

2010-04-12 Thread Shawn Heisey
I am trying to boost relevancy based on a date field with dismax, and I've included the requestHandler config below. The post_date field in my database is simple UNIX time, seconds since epoch. It's in a MySQL bigint field, so I've stored it as a tlong in Solr. This filed is required by our

Re: dismax and date boosts

2010-04-12 Thread Shawn Heisey
On 4/12/2010 11:55 AM, Shawn Heisey wrote: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 And here we have the perfect example of something I mentioned a while ago - my Thunderbird (v3.0.4 on Win7) turning Solr boost syntax

Re: dismax and date boosts

2010-04-14 Thread Shawn Heisey
On 4/14/2010 8:12 AM, Shawn Heisey wrote: On 4/12/2010 9:29 PM, Lance Norskog wrote: During indexing: the basic Solr XmlUpdateHandler does not have a facility for this. In the DataImportHandler you can add Javascript that takes your 'seconds since epoch', adds the delta between your epoch and 1

Re: Benchmarking Solr

2010-04-14 Thread Shawn Heisey
On 4/12/2010 9:57 AM, Shawn Heisey wrote: On 4/12/2010 8:51 AM, Paolo Castagna wrote: There are already two related pages: - http://wiki.apache.org/solr/SolrPerformanceFactors - http://wiki.apache.org/solr/SolrPerformanceData Why not to create a new page? - http://wiki.apache.org/solr

Re: dismax and date boosts

2010-04-15 Thread Shawn Heisey
. You should not have to do any arithmetic or formatting of date strings. This may need a few layers of SQL functions. On 4/14/10, Shawn Heiseys...@elyograg.org wrote: On 4/14/2010 8:12 AM, Shawn Heisey wrote: On 4/12/2010 9:29 PM, Lance Norskog wrote: During indexing: the basic

Turn off request logging for some handlers?

2010-04-15 Thread Shawn Heisey
Is it possible to turn off request logging for some handlers? Specifically, I'd like to stop logging requests to /admin/ping and /replication, which get hit very often. I looked around for an answer but wasn't able to find anything. Thanks, Shawn

Re: Turn off request logging for some handlers?

2010-04-15 Thread Shawn Heisey
On 4/15/2010 9:54 AM, Michael Kuhlmann wrote: you can set logging for nearly every single task here: http://host:port/solr/admin/logging.jsp I'm pretty sure that refers to the output that normally goes to stderr, I'm talking about the logs that go to files like 2010_04_15.request.log.

Re: Fwd: Query 2 Cores

2010-04-19 Thread Shawn Heisey
On 4/19/2010 11:09 AM, Lee Smith wrote: http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2q=attr_content:test Is this the correct way to query 2 cores at once ? This should do what you want:

Re: dismax and date boosts

2010-04-20 Thread Shawn Heisey
So, if I have my database multiply my value by 1000, I can put that directly into a tdate field and it'll work as expected? If that's the case, I think I might be able to modify my query from SELECT * to SELECT *,post_date*1000 as pdate and add the pdate field to the schema as type tdate.

Re: dismax and date boosts

2010-04-20 Thread Shawn Heisey
I found what I believe is a better option even if the multiplication would work - FROM_UNIXTIME. That returns the same kind of output as you get from an actual database date field. On 4/20/2010 12:07 PM, Shawn Heisey wrote: So, if I have my database multiply my value by 1000, I can put

Re: DIH dataimport.properties with

2010-04-20 Thread Shawn Heisey
Michael, The SolrEntityProcessor looks very intriguing, but it won't work with the released 1.4 version. If that's OK with you and it looks like it'll do what you want, feel free to ignore the rest of this. I'm also using MySQL as an import source for Solr. I was unable to use the

Re: DIH dataimport.properties with

2010-04-20 Thread Shawn Heisey
On 4/20/2010 9:09 PM, caman wrote: Shawn, Is this your custom implementation? For a delta-import, minDid comes from the maxDid value stored after the last successful import. Are you updating the dataTable after the import was successful? How did you handle this? I have similar scenario and

Retrieve time of last optimize

2010-04-21 Thread Shawn Heisey
Is it possible to issue some kind of query to a Solr core that will return the last time the index was optimized? Every day, one of my shards should get optimized, so I would like my monitoring system to tell me when the newest optimize date is more than 24 hours ago. I could not find a way

Re: Retrieve time of last optimize

2010-04-22 Thread Shawn Heisey
On 4/21/2010 1:24 PM, Shawn Heisey wrote: Is it possible to issue some kind of query to a Solr core that will return the last time the index was optimized? Every day, one of my shards should get optimized, so I would like my monitoring system to tell me when the newest optimize date is more

Re: multiple cores on SOLR under Tomcat

2010-04-27 Thread Shawn Heisey
Here's how I've got things set up. It's a different directory structure than yous, and I run it under jetty, but hopefully it gives you the basic idea. The dataDir setting is relative to the instanceDir setting. I run jetty with -Dsolr.solr.home=/index/solr so it can find solr.xml.

Re: Recommended MySQL JDBC driver

2010-05-14 Thread Shawn Heisey
I would like to know the same thing. I'm using 5.1.12 myself. A full reindex of one of my shards takes 4-6 hours for 7 million rows, depending on whether I run them one at a time or all at once. If I run the same query on the same machine with the commandline client and write the results to

Re: Recommended MySQL JDBC driver

2010-05-14 Thread Shawn Heisey
Lucas.. was there a reason you went with 5.1.10 or was it just the latest when you started your Solr project? just what was recent when i set things up. Also, how many items are in your index and how big is your index size? index size is 4.6GB with about 16M entities. I

SOLR-788

2010-05-17 Thread Shawn Heisey
I am looking at SOLR-788, trying to apply it to latest trunk. It looks like that's going to require some rework, because the included constant PURPOSE_GET_MLT_RESULTS conflicts with something added later, PURPOSE_GET_TERMS. How hard would it be to rework this to apply correctly to trunk? Is

Re: shards design/customization coding question

2010-05-17 Thread Shawn Heisey
On 5/17/2010 2:40 PM, D C wrote: We have a large index, separated into multiple shards, that consists of records exported from a database. One requirement is to support near real-time synchronization with the database. To accomplish this we are considering creating a daily shard where create

Re: SOLR-788 and merged trunk

2010-05-18 Thread Shawn Heisey
On 5/17/2010 3:34 PM, Shawn Heisey wrote: I am looking at SOLR-788, trying to apply it to latest trunk. It looks like that's going to require some rework, because the included constant PURPOSE_GET_MLT_RESULTS conflicts with something added later, PURPOSE_GET_TERMS. How hard would

Re: Recommended MySQL JDBC driver

2010-05-18 Thread Shawn Heisey
On 5/14/2010 12:40 PM, Shawn Heisey wrote: I downgraded to 5.0.8 for testing. Initially, I thought it was going to be faster, but it slows down as it gets further into the index. It now looks like it's probably going to take the same amount of time. On the server timeout thing - that's

DIH and denormalizing

2010-06-28 Thread Shawn Heisey
I am trying to do some denormalizing with DIH from a MySQL source. Here's part of my data-config.xml: entity name=dataTable pk=did query=SELECT *,FROM_UNIXTIME(post_date) as pd FROM ncdat WHERE did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} AND (did

Re: DIH and denormalizing

2010-06-28 Thread Shawn Heisey
On 6/28/2010 3:28 PM, caman wrote: In your query 'query=SELECT webtable as wt FROM ncdat_wt WHERE featurecode='${ncdat.feature}' .. instead of ${ncdat.feature} use ${dataTable.feature} where dataTable is your parent entity name. I knew it would be something stupid like that. I thought I

Disk usage per-field

2010-06-30 Thread Shawn Heisey
Is it possible for Solr (or Luke/Lucene) to tell me exactly how much of the total index disk space is used by each field? It would also be very nice to know, for each field, how much is used by the index and how much is used for stored data.

Re: Realtime + Batch indexing

2010-07-09 Thread Shawn Heisey
Replication does not transfer files that already exist on the slave and have the same metadata (size, last modified, etc) as the master. As far as deleting files, it will only do so if they do not exist on the master. In most cases, the only way that it would delete and copy the entire index

Re: Realtime + Batch indexing

2010-07-09 Thread Shawn Heisey
It's possible to get near real-time adds and updates (every two minutes in our case) with a multi-shard setup, if you have a shard dedicated to new content and have the right combination of unique identifiers on your data. I'll respond off-list with a full description of my setup. On

date boosting and dismax

2010-07-14 Thread Shawn Heisey
I've started a couple of previous threads on this topic, but I did not have a good date field in my index to use at the time. I now have a schema with the document's post_date in tdate format, so I would like to actually do some implementation. Right now, we are not doing relevancy ranking

Re: date boosting and dismax

2010-07-14 Thread Shawn Heisey
One of the replies I got on a previous thread mentioned range queries, with this example: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Something like this seems more flexible, and into it, I read an implication that the

Re: dismax and date boosts

2010-07-14 Thread Shawn Heisey
I have finally figured out how to turn this off in Thunderbird 3: Go to Tools, Options, Display, and turn off Display emoticons as graphics. On 4/12/2010 12:04 PM, Shawn Heisey wrote: On 4/12/2010 11:55 AM, Shawn Heisey wrote: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW

Re: How to speed up solr search speed

2010-07-17 Thread Shawn Heisey
On 7/17/2010 3:28 AM, marship wrote: Hi. Peter and All. I merged my indexes today. Now each index stores 10M document. Now I only have 10 solr cores. And I used java -Xmx1g -jar -server start.jar to start the jetty server. How big are the indexes on each of those cores? You can easily get

Re: AW: Facets on multiple values

2010-07-29 Thread Shawn Heisey
I'm developing a new schema that includes something similar. The DIH database select statement uses a left join to gather a set of values for each main record into a new field, separated by semicolons. I put the result into a fieldType with the following analyzer chain, which breaks it up

Re: AW: Facets on multiple values

2010-07-29 Thread Shawn Heisey
On 7/29/2010 12:18 PM, Chris Hostetter wrote: it also depends on what you want to get *out* if this is a stored field ... using an analyzer like this will deal with letting you facet on the individual terms, but the stored vaue returned with each document will still be a single semi-colon

Re: AW: Facets on multiple values

2010-07-29 Thread Shawn Heisey
On 7/29/2010 1:13 PM, Chris Hostetter wrote: : My initial approach was to grab the values (which are in another table) with a : DIH subentity and store them in a multivalued field, but that reduced index : speed to a crawl. That's because instead of one query for the entire import, : it was

SOLR-788 - disributed More Like This

2010-08-12 Thread Shawn Heisey
I tried some time ago to use SOLR-788. Ultimately I was able to get both patch versions to apply (separately), but neither worked. The suggestion I received when I commented on the issue was to download the specific release mentioned in the patch and then update, but the patch was created

Re: General questions about distributed solr shards

2010-08-12 Thread Shawn Heisey
On 8/11/2010 3:27 PM, JohnRodey wrote: 1) Is there any information on preferred maximum sizes for a single solr index. I've read some people say 10 million, some say 80 million, etc... Is there any official recommendation or has anyone experimented with large datasets into the tens of

Re: DataImportHandler and SAXParseExceptions with Jetty

2010-08-13 Thread Shawn Heisey
On 8/12/2010 8:32 PM, harrysmith wrote: Win XP, Solr 1.4.1 out of the box install, using jetty. If I add greater than or less than (ie or) in any xml field and attempt to load or run from the DataImportConsole I receive a SAXParseException. Example follows: If I don't have a 'less than' it

Re: Solr date NOW - format?

2010-08-17 Thread Shawn Heisey
On 4/9/2010 7:35 PM, Lance Norskog wrote: Function queries are notoriously slow. Another way to boost by year is with range queries: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Notice that you get to have a non-linear curve

Sort by date, filter by score?

2010-08-17 Thread Shawn Heisey
I have had a request from our development team. I did some searching and could not find an answer. They want to sort by a date field but filter out all results below a minimum relevancy score. Is this possible? I suspect that our only option will be to do the search sorted by relevancy

Re: Solr date NOW - format?

2010-08-17 Thread Shawn Heisey
Would I do separate bq values for each of the ranges, or is there a way to include them all at once? If it's the latter, I'll need a full example with a field name, because I'm clueless. :) On 8/17/2010 2:29 PM, Lance Norskog wrote: I think 'bq=' is what you want. In dismax the main query

Re: Solr date NOW - format?

2010-08-17 Thread Shawn Heisey
, Shawn Heisey wrote: Would I do separate bq values for each of the ranges, or is there a way to include them all at once? If it's the latter, I'll need a full example with a field name, because I'm clueless. :) On 8/17/2010 2:29 PM, Lance Norskog wrote: I think 'bq=' is what you want

Re: improving search response time

2010-08-18 Thread Shawn Heisey
Most of your time is spent doing the query itself, which in the light of other information provided, does not surprise me. With 12GB of RAM and 9GB dedicated to the java heap, the available RAM for disk caching is pretty low, especially if Solr is actually using all 9GB. Since your index is

spellcheck index blown away during rebuild

2010-08-20 Thread Shawn Heisey
I am just delving into the spellcheckcomponent on a test server running a 3.1 build from June 29th. I have noticed that when you ask for a rebuild of the spell check index, it deletes it before starting the rebuild. It takes about 39 minutes to build one (3GB), which is a long time to do

Re: spellcheck index blown away during rebuild

2010-08-22 Thread Shawn Heisey
On 8/20/2010 8:56 PM, Lance Norskog wrote: The first question is about your use cases. How many words are in the eventual 3GB spelling index? Do you really need that many? Spell-checking is a more controllable UI if you make it from a dictionary. It's built from an index-only field that

Solr Admin Schema Browser and field named keywords

2010-08-23 Thread Shawn Heisey
I have a field named keywords in my index. The schema browser page is not able to deal with this, so I have trouble getting statistical information on this field. When I click on the field, Firefox hangs for a minute and then gives the unresponsive script warning. I assume (without

Re: Solr Admin Schema Browser and field named keywords

2010-08-23 Thread Shawn Heisey
On 8/23/2010 12:07 AM, Shawn Heisey wrote: I have a field named keywords in my index. The schema browser page is not able to deal with this, so I have trouble getting statistical information on this field. When I click on the field, Firefox hangs for a minute and then gives

Multiple passes with WordDelimiterFilterFactory

2010-08-26 Thread Shawn Heisey
Can I pass my data through WordDelimiterFilterFactory more than once? It occurs to me that I might get better results if I can do some of the filters separately and use preserveOriginal on some of them but not others. Currently I am using the following definition on both indexing and

Re: sort by field length

2010-08-26 Thread Shawn Heisey
On 5/24/2010 6:30 AM, Sascha Szott wrote: Hi folks, is it possible to sort by field length without having to (redundantly) save the length information in a seperate index field? At first, I thought to accomplish this using a function query, but I couldn't find an appropriate one. I have

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-28 Thread Shawn Heisey
It's metadata for a collection of 45 million documents that is mostly photos, with some videos and text. The data is imported from a MySQL database and split among six large shards (each nearly 13GB) and a small shard with data added in the last week, which usually works out to between

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
It's metadata for a collection of 45 million documents that is mostly photos, with some videos and text. The data is imported from a MySQL database and split among six large shards (each nearly 13GB) and a small shard with data added in the last week. That works out to between 300,000 and

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
On 8/28/2010 7:59 PM, Shawn Heisey wrote: The only drop in term quality that I noticed was that possessive words (apostrophe-s) no longer have the original preserved. I haven't yet decided whether that's a problem. I finally did notice another drop in term quality from the dual pass

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-29 Thread Shawn Heisey
Thank you for taking the time to help. The way I've got the word delimiter index filter set up with only one pass, wolf-biederman will result in wolf, biederman, wolfbiederman, and wolf-biederman. With two passes, the last one is not present. One pass changes gremlin's to gremlin and

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-30 Thread Shawn Heisey
On 8/29/2010 2:17 PM, Erick Erickson wrote: charFilters are applied even before the tokenizer Try putting this after any instances of, say, WhiteSpaceTokenizerFactory in your analyzser definition, and I believe you'll see that this is not true. At least looking at this in the analysis page from

Re: Multiple passes with WordDelimiterFilterFactory

2010-08-30 Thread Shawn Heisey
On 8/30/2010 9:01 AM, Shawn Heisey wrote: On 8/29/2010 2:17 PM, Erick Erickson wrote: charFilters are applied even before the tokenizer Try putting this after any instances of, say, WhiteSpaceTokenizerFactory in your analyzser definition, and I believe you'll see that this is not true

Stripping leading/trailing punctuation with SOLR-1653

2010-08-31 Thread Shawn Heisey
I am trying to use PatternReplaceCharFilterFactory (SOLR-1653) to strip leading and trailing punctuation from terms. It's not working. This was previously discussed here as part of something I was trying with WordDelimiterFilterFactory, but I think it needs its own thread now. I seem to be

Re: Stripping leading/trailing punctuation with SOLR-1653

2010-08-31 Thread Shawn Heisey
HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) On 8/31/2010 8:23 AM, Shawn Heisey wrote: I am trying to use PatternReplaceCharFilterFactory (SOLR-1653) to strip leading and trailing punctuation from terms. It's not working. This was previously discussed here as part of something I

Re: Stripping leading/trailing punctuation with SOLR-1653

2010-08-31 Thread Shawn Heisey
now. This filter is not mentioned on the wiki page dealing with analyzers, which is why I did not use it from the start. When I searched that page for regex, the CharFilter was the only one that came up. On 8/31/2010 8:29 AM, Shawn Heisey wrote: I didn't give any particulars about my setup

Re: Stripping leading/trailing punctuation with SOLR-1653

2010-08-31 Thread Shawn Heisey
On 8/31/2010 8:49 AM, Shawn Heisey wrote: I believe I may have solved this. After a more careful reading of SOLR-1653, I noticed that they referred to another filter. I changed my configuration from /solr/.PatternReplaceCharFilterFactory to /solr/.PatternReplaceFilterFactory and updated

Re: Solr Admin Schema Browser and field named keywords

2010-09-01 Thread Shawn Heisey
On 8/26/2010 5:04 PM, Chris Hostetter wrote: doubtful. I suspect it has more to do with the amount of data in your keywords field and the underlying request to hte LukeRequestHandler timing out. have you tried using it with a test index where the keywords field has only a few words in it?

Re: Hardware Specs Question

2010-09-02 Thread Shawn Heisey
On 9/2/2010 2:54 AM, Toke Eskildsen wrote: We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0). The harddisk setups never stood a chance for searching. With current SSD's being faster than

Re: Solr crawls during replication

2010-09-03 Thread Shawn Heisey
On 9/2/2010 9:31 AM, Mark wrote: Thanks for the suggestions. Our slaves have 12G with 10G dedicated to the JVM.. too much? Are the rysnc snappuller featurs still available in 1.4.1? I may try that to see if helps. Configuration of the switches may also be possible. Also, would you mind

Re: Hardware Specs Question

2010-09-03 Thread Shawn Heisey
On 9/3/2010 3:39 AM, Toke Eskildsen wrote: I'll have to extrapolate a lot here (also known as guessing). You don't mention what kind of harddrives you're using, so let's say 15.000 RPM to err on the high-end side. Compared to the 2 drives @ 15.000 RPM in RAID 1 we've experimented with, the

Re: Solr crawls during replication

2010-09-03 Thread Shawn Heisey
On 9/3/2010 12:37 PM, Jonathan Rochkind wrote: Is the OS disk cache something you configure, or something the OS just does automatically based on available free RAM? Or does it depend on the exact OS? Thinking about the OS disk cache is new to me. Thanks for any tips. Depends on what you

Using more than one name for a query field - aliases

2010-09-09 Thread Shawn Heisey
I find myself in need of the ability to access one field by more than one name, for application transition purposes. Right now we have a field (ft_text, by far the largest part of the index) that is indexed but not stored. This field and three others are copied into an additional field

Re: Delta Import with something other than Date

2010-09-09 Thread Shawn Heisey
On 9/8/2010 4:32 PM, David Yang wrote: I have a table that I want to index, and the table has no datetime stamp. However, the table is append only so the primary key can only go up. Is it possible to store the last primary key, and use some delta query=select id where id${last_id_value} I

Re: Delta Import with something other than Date

2010-09-09 Thread Shawn Heisey
On 9/9/2010 1:23 PM, Vladimir Sutskever wrote: Shawn, Can you provide a sample of passing the parameter via URL? And how using it would look in the data-config.xml Here's the URL that I send to do a full build on my last shard:

Re: PatternReplaceCharFilterFactory?

2010-09-09 Thread Shawn Heisey
On 9/9/2010 5:38 PM, Erick Erickson wrote: Could you give us an idea of why you think it isn't present? As far as I can tell, it's been around for a while. Are you getting an error and if so, can you show it to us? Look in schema.xml of what you downloaded (probably in the example directory).

Change what gets logged when service is disabled

2010-09-09 Thread Shawn Heisey
I use the PingRequestHandler option that tells my load balancer whether a machine is available. When the service is disabled, every one of those requests, which my load balancer makes every five seconds, results in the following in the log: Sep 9, 2010 6:06:58 PM

Re: Color search for images

2010-09-16 Thread Shawn Heisey
On 9/15/2010 10:50 AM, Shashi Kant wrote: Shawn, I have done some research into this, machine-vision especially on a large scale is a hard problem, not to be entered into lightly. I would recommend starting with OpenCV - a comprehensive toolkit for extracting various features such as Color,

Re: Color search for images

2010-09-16 Thread Shawn Heisey
On 9/16/2010 7:45 AM, Shashi Kant wrote: Lire is a nascent effort and based on a cursory overview a while back, IMHO was an over-simplified version of what a CBIR engine should be. They use CEDD (color edge descriptors). Wouldn't work for the kind of applications I am working on - which needs

Re: Simple Filter Query (fq) Use Case Question

2010-09-17 Thread Shawn Heisey
On 9/16/2010 12:27 PM, Dennis Gearon wrote: Is a core a running piece of software, or just an index/config pairing? Dennis Gearon A core is one complete index within a Solr instance. http://wiki.apache.org/solr/CoreAdmin My master index servers have five cores - ncmain, ncrss, live, build,

Re: DIH: alternative approach to deltaQuery

2010-09-17 Thread Shawn Heisey
On 9/17/2010 3:01 AM, Paul Dhaliwal wrote: Another feature missing in DIH is ability to pass parameters into your queries. If one could pass a named or positional parameter for an entity query, it will give them lot of freedom to optimize their delta or full load queries. One can even get

Re: Using more than one name for a query field - aliases

2010-09-17 Thread Shawn Heisey
On 9/17/2010 7:22 PM, Chris Hostetter wrote: a) not really. assuming you have no problem modifying the indexing code in the way you want, and are primarily worried about searching from various clients, then the most straight forward approach is probably to use RewriteRules (or something

Re: Concurrent DB updates and delta import misses few records

2010-09-22 Thread Shawn Heisey
On 9/22/2010 1:39 AM, Shashikant Kore wrote: Hi, I'm using DIH to index records from a database. After every update on (MySQL) DB, Solr DIH is invoked for delta import. In my tests, I have observed that if db updates and DIH import is happening concurrently, import misses few records. Here

Re: Concurrent DB updates and delta import misses few records

2010-09-27 Thread Shawn Heisey
You could get it from Solr, yes. That didn't even occur to me because when I was designing my scripts, I didn't yet have a fully integrated Solr index. :) With hindsight, I still wouldn't get it from Solr. I would lose some flexibility and ease of administration. It's certainly possible

Re: Multicore Example

2010-02-19 Thread Shawn Heisey
Assuming you are on a unix variant with a working lsof, use this. This probably won't work correctly on Solaris 10: lsof -nPi | grep 8983 lsof -nPi | grep 8080 On Windows, you can do this in a command prompt. It requires elevation on Vista or later. The -b option was added in WinXP SP2 and

Re: If you could have one feature in Solr...

2010-02-25 Thread Shawn Heisey
I would like to be able to do a delta import on arbitrary data, not a last modified date. Specifically, our database has an auto_increment field called DID, or document identifier. For changes to existing data. this field is updated anytime a row is changed in any way, effectively turning it

Advice on deployment

2010-02-25 Thread Shawn Heisey
We are currently using a commerical indexing product based on Lucene for our indexing needs, and would like to replace it with SOLR. The source database for this system has 40 million records, growing by about 30,000 items per day. It is a repository for all the metadata relating to an

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread Shawn Heisey
At the 9+ hour mark, is your database server showing active connections that are sending data, or is all the activity local to SOLR? We have a 40 million row database in MySQL, with each row comprising more than 80 fields. I'm including the config from one of our shards. There are about 6.6

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread Shawn Heisey
parameter is larger than physical memory. If this is happening, you'd definitely see constant hard drive light blinking. On 3/6/2010 10:20 AM, Shawn Heisey wrote: At the 9+ hour mark, is your database server showing active connections that are sending data, or is all the activity local to SOLR

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-06 Thread Shawn Heisey
Do keep looking into the batchSize, but I think I might have found the issue. If I understand things correctly, you will need to add processor=CachedSqlEntityProcessor to your first entity. It's only specified on the other two. Assuming you have enough RAM and heap space available in your

Re: Import database

2010-03-08 Thread Shawn Heisey
What database are you using? Many of the JDBC drivers try to pull the entire resultset into RAM before feeding it to the application that requested the data. If it's MySQL, I can show you how to fix it. The batchSize parameter below tells it to stream the data rather than buffer it. With

Re: Distributed search fault tolerance

2010-03-11 Thread Shawn Heisey
I guess I must be including too much information in my questions, running into tl;dr with them. Later today when I have more time I'll try to make it more bite-size. On 3/9/2010 2:28 PM, Shawn Heisey wrote: I attended the Webinar on March 4th. Many thanks to Yonik for putting

A bunch of questions

2010-03-12 Thread Shawn Heisey
Does SolrCloud's notion of a collection, which appears to use cores, override normal multi-core usage for building an offline index and quickly swapping it into production? Some of the features in SolrCloud look useful, if it's still possible to exert manual control over cores and shards.

Re: Indexing CLOB Column in Oracle

2010-03-16 Thread Shawn Heisey
Disclaimer: My Oracle experience is miniscule at best. I am also a beginner at Solr, so grab yourself the proverbial grain of salt. I googled a bit on CLOB. One page I found mentioned setting up a view to return the data type you want. Can you use the functions described on these pages in

DIH questions

2010-03-18 Thread Shawn Heisey
Below is my data-config.xml file, which I am using to build an index for my first shard. I have a couple of questions. Can Solr include the hostname (short version) it's running on in the query? Alternatively, is there a way to override the query with a URL parameter before or when doing

Re: DIH questions

2010-03-18 Thread Shawn Heisey
That looks very useful. So does this mean that this will work? URL text: ?command=full-importnumShards=6modValue=0minDid=229615984 XML: query=SELECT * FROM [table] WHERE (did % ${dataimporter.request.numShards}) = ${dataimporter.request.modValue} AND ${dataimporter.request.minDid} = did

  1   2   3   4   5   6   7   8   9   10   >