Why solr 4.0 use FSIndexOutput to write file, otherwise MMap/NIO

2013-06-28 Thread Jeffery Wang
I have checked the FSDirectory, it will create MMapDirectory or NIOFSDirectory for Directory. This two directory only supply IndexInput extend for read file (MMapIndexInput extends ByteBufferIndexInput), why not there is not MMap/NIO IndexOutput extend for file write. It only use FSIndexOutput

Stemming query in Solr

2013-06-28 Thread snkar
We have a search system based on Solr using the Solrnet library in C# which supports some advanced search features like Fuzzy, Synonym and Stemming. While all of these work, *the expectation from the Stemming Search seems to be a combination of Stemming by reduction as well as stemming by

Re: Normalizing/Returning solr scores between 0 to 1

2013-06-28 Thread Upayavira
And if Solr has to spit it out, perhaps you could do that with a simple salt transform or velocity template. Upayavira On Fri, Jun 28, 2013, at 12:30 AM, Learner wrote: Might not be useful but a work around would be to divide all scores by max score to get scores between 0 and 1. --

Re: URL search and indexing

2013-06-28 Thread Flavio Pompermaier
Thanks for the explanation, I was missing exaclty that! Now things works correctly also using the post script. However I don't think I need norms if I use id of same lenght (UUID), right? I just need strings with omitTermFreqAndPositions=false I think. On Thu, Jun 27, 2013 at 7:31 PM, Erick

Re: URL search and indexing

2013-06-28 Thread Upayavira
field length normalisation is based upon the number of terms in a field, not the number of characters in a term. I guess with multivalued string fields, that would mean a field with lots of values (but one match) would score lower than one with only one matching value. Upayavira On Fri, Jun 28,

Re: How spell checker used if indexed document is containing misspelled words

2013-06-28 Thread venkatesham.gu...@igate.com
Thanks for the replies. I have already tried options mentioned here, apparently those provide suggestions for the query word which is incorrectly spelled. I am looking a feature that - my query term is correct and I want the results in those documents both correct spelled term matches and

Re: How spell checker used if indexed document is containing misspelled words

2013-06-28 Thread Upayavira
You're wanting to make your search more fuzzy. You could try phonetic search, but that's very fuzzy. Go to the analysis tab in the admin UI. Locate the 'phonetic' field type in the drop down, and you can see what will happen to terms when they are converted to phonetic equivalents. Upayavira On

Context search in solr

2013-06-28 Thread venkatesham.gu...@igate.com
My search query is having multiple words ranging from 3 to 8 and a context attached to it. I am looking for the search result documents which should have all the terms which are there in query and also terms in the document should relate or have the similar context. For example: my search query

Re: Context search in solr

2013-06-28 Thread Upayavira
you might use proximity. low blood pressure~6 might match #1 and #2 but not #3. It says find phrases that require six or less position moves in order to match my terms as a phrase. Upayavira On Fri, Jun 28, 2013, at 11:10 AM, venkatesham.gu...@igate.com wrote: My search query is having

Re: Why solr 4.0 use FSIndexOutput to write file, otherwise MMap/NIO

2013-06-28 Thread Michael McCandless
Output is quite a bit simpler than input because all we do is write a single stream of bytes with no seeking (append only), and it's done with only one thread, so I don't think there'd be much to gain by using the newer IO APIs for writing... Mike McCandless http://blog.mikemccandless.com On

Re: Solr admin search with wildcard

2013-06-28 Thread Erick Erickson
This is a no-op, or rather I'm not sure what it does: copyField source=url dest=url/ This is the key: copyField source=iframe dest=text/ But be aware that if you copy anything else into the text field you'll be searching there too. Now you can search the text field. Assuming this is from the

Re: Filter queries taking a long time, even with cache disabled

2013-06-28 Thread Erick Erickson
I'm guessing you're well aware that the example you gave is parsed as search_field:love default_field:obama. Which isn't pertinent, there's nothing that looks like it should take any time at all here, to say nothing of 120 seconds. So start with debug=query and see what the filter query is

Re: Field Query After Collapse.Field?

2013-06-28 Thread Erick Erickson
bq: Is there anyway to perform the field query after the results are collapsed? I'm not quite sure what you mean here. The intent of fq clauses it that they apply to the entire query before anything else, including field collapsing (and I'm assuming you mean group.field, not collapse.field)

Re: solrj indexing using embedded solr is slow

2013-06-28 Thread Erick Erickson
First, how much slower? 2x? 10x? 1.1x? When using embedded, you're doing all the work you were doing on two machines on a single machine, so my first question would be how is your CPU performaing? Is it maxed? Best Erick On Thu, Jun 27, 2013 at 1:59 PM, Learner bbar...@gmail.com wrote:

Re: Stemming query in Solr

2013-06-28 Thread Erick Erickson
First, this is for the Java version, I hope it extends to C#. But in your configuration, when you're indexing the stemmer should be storing the reduced form in the index. Then, when searching, the search should be against the reduced term. To check this, try 1 Using the Admin/Analysis page to see

Re: Context search in solr

2013-06-28 Thread Erick Erickson
One variant on Upayavira's comment would be to use the proximity as a boost query. That way all three would match, but the first two would get higher scores. Either way should work though. Best Erick On Fri, Jun 28, 2013 at 6:29 AM, Upayavira u...@odoko.co.uk wrote: you might use proximity.

Re: Field Query After Collapse.Field?

2013-06-28 Thread slevytam
Hi Erick, I actually did mean collapse.field, as per: http://blog.trifork.com/2009/10/20/result-grouping-field-collapsing-with-solr/ On high level I am trying to avoid the use of a join between a list of entries and a list of actions that users have performed on a entry (since it's not supported

Re: Field Query After Collapse.Field?

2013-06-28 Thread Erick Erickson
Well, now I'm really puzzled. The link you referenced was from when grouping/field collapsing was under development. I did a quick look through the entire 4x code base fo collapse and there's no place I saw that looks like it accepts that parameter. Of course I may have just missed it. What

Re: Replicating files containing external file fields

2013-06-28 Thread Arun Rangarajan
Erick, Thx for your reply. The external file field fields are already under dataDir specified in solrconfig.xml. They are not getting replicated. (Solr version 4.2.1.) On Thu, Jun 27, 2013 at 10:50 AM, Erick Erickson erickerick...@gmail.comwrote: Haven't tried this, but I _think_ you can use

Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
Hi, I'm using lucene and solr right now in a production environment with an index of about a million docs. I'm working on a recommender that basically would list the n most similar items to the user based on the current item he is viewing. I've been thinking of using solr/lucene since I already

Re: Replicating files containing external file fields

2013-06-28 Thread Jack Krupansky
Show us your confFiles directive. Maybe there is some subtle error in the file name. -- Jack Krupansky -Original Message- From: Arun Rangarajan Sent: Friday, June 28, 2013 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Replicating files containing external file fields Erick,

Error- missing sfield for spatial request

2013-06-28 Thread Learner
I am trying to combine geospatial query (latlong) with the below query inside a search component but I am getting the below error.. *Error:* lst name=error str name=msgmissing sfield for spatial request/str int name=code400/int /lst /response str name=fq_bbox (

RE: Content based recommender using lucene/solr

2013-06-28 Thread Saikat Kanjilal
Why not just use mahout to do this, there is an item similarity algorithm in mahout that does exactly this :) https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html You can use mahout in distributed and non-distributed mode

Re: Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
Hey saikat, thanks for your suggestion. I've looked into mahout and other alternatives for computing k nearest neighbors. I would have to run a job and computer the k nearest neighbors and track them in the index for retrieval. I wanted to see if this was something I could do with lucene using

broken links returned from solr search

2013-06-28 Thread MA LIG
Hello, I ran the solr example as described in http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some doc files to solr as described in http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I used to load the files were of the form curl

Re: Content based recommender using lucene/solr

2013-06-28 Thread Otis Gospodnetic
Hi, Have a look at http://www.youtube.com/watch?v=13yQbaW2V4Y . I'd say it's easier than Mahout, especially if you already have and know your way around Solr. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at

RE: Content based recommender using lucene/solr

2013-06-28 Thread Saikat Kanjilal
You could build a custom recommender in mahout to accomplish this, also just out of curiosity why the content based approach as opposed to building a recommender based on co-occurence. One other thing, what is your data size, are you looking at scale where you need something like hadoop?

Re: Content based recommender using lucene/solr

2013-06-28 Thread Walter Underwood
More Like This already is kNN. It extracts features from the document (makes a query), and runs that query against the collection. If you want the items most similar to the current item, use MLT. wunder On Jun 28, 2013, at 11:02 AM, Luis Carlos Guerrero Covo wrote: Hey saikat, thanks for

Re: Field Query After Collapse.Field?

2013-06-28 Thread Bryan Bende
Can you just use two queries to achieve the desired results ? Query1 to get all actions where !entry_read:1 for some range of rows (your page size) Query2 to get all the entries with an entry_id in the results of Query1 The second query would be very direct and only query for a set of entries

Re: Content based recommender using lucene/solr

2013-06-28 Thread Otis Gospodnetic
Hi, It doesn't have to be one or the other. In the past I've built a news recommender engine based on CF (Mahout) and combined it with Content Similarity-based engine (wasn't Solr/Lucene, but something custom that worked with ngrams, but it may have as well been Lucene/Solr/ES). It worked well.

Re: Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
I only have about a million docs right now so scaling is not a big issue. I'm looking to provide a quick implementation and then worry about scale when I get around to implementing a more robust recommender. I'm looking at a content based approach because we are not tracking users and items viewed

Re: full-import failed after 5 hours with Exception: ORA-01555: snapshot too old: rollback segment number with name too small ORA-22924: snapshot too old

2013-06-28 Thread Otis Gospodnetic
Hi, I'd go talk to the DBA. How long does this query take if you run it directly against Oracle? How long if you run it locally vs. from a remove server (like Solr is in relation to your Oracle server(s)). What happens if you increase batchSize? Otis -- Solr ElasticSearch Support --

Re: Field Query After Collapse.Field?

2013-06-28 Thread slevytam
Hi Erick, I have no idea how I managed to get that working. I was messing around a lot. I may have added org.apache.solr.handler.component.CollapseComponent to an older version :- Unfortunately, I've formatted the server since to try some other options. I did find the official wiki page for

Re: Field Query After Collapse.Field?

2013-06-28 Thread slevytam
Unfortunately not. That would require an object for every single entry for every single user. Generating millions of basically empty objects just for this query is likely impossible. :( -- View this message in context:

RE: shardkey

2013-06-28 Thread Joshi, Shital
Thanks Mark. We use commit=true as part of the request to add documents. Something like this: echo $data| curl --proxy --silent http://HOST:9983/solr/collection1/update/csv?commit=trueseparator=|fieldnames=$fieldnames_shard_=shard1 --data-binary @- -H 'Content-type:text/plain;

cores sharing an instance

2013-06-28 Thread Peyman Faratin
Hi I have a multicore setup (in 4.3.0). Is it possible for one core to share an instance of its class with other cores at run time? i.e. At run time core 1 makes an instance of object O_i core 1 -- object O_i core 2 --- core n then can core K access O_i? I know they can share properties but

dataconfig to index ZIP Files

2013-06-28 Thread ericrs22
So I thought I had it correctly setup but I'm receiveing the following response to my Data Config Last Update: 18:17:52 (Duration: 07s) Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s) Started: 13 minutes ago Here's my Data config. dataConfig dataSource

Re: dataconfig to index ZIP Files

2013-06-28 Thread Steve Rowe
Hi, Maybe fileName=*.zip instead of .*zip ? Steve On Jun 28, 2013, at 2:20 PM, ericrs22 ericr...@yahoo.com wrote: So I thought I had it correctly setup but I'm receiveing the following response to my Data Config Last Update: 18:17:52 (Duration: 07s) Requests: 0 (0/s), Fetched: 0

Re: shardkey

2013-06-28 Thread Mark Miller
Yeah, that is what I would try until 4.4 comes out - and it should not matter replica or leader. - Mark On Jun 28, 2013, at 3:13 PM, Joshi, Shital shital.jo...@gs.com wrote: Thanks Mark. We use commit=true as part of the request to add documents. Something like this: echo $data| curl

RE: shardkey

2013-06-28 Thread Joshi, Shital
Thanks! -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, June 28, 2013 5:06 PM To: solr-user@lucene.apache.org Subject: Re: shardkey Yeah, that is what I would try until 4.4 comes out - and it should not matter replica or leader. - Mark On Jun 28,

Re: dataconfig to index ZIP Files

2013-06-28 Thread ericrs22
unfortunately not. I had tried that before with the logs saying: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 With .*zip i get this: WARN SimplePropertiesWriter Unable to read:

change solr core schema and config via http

2013-06-28 Thread Wu, James C.
Hi, I am trying to figure out how to change the schema/config of an existing core or a core to be created via http calls to solr. After spending hours in searching online, I still could not find any documents showing me how to do it. The only way I know is that you have to log on to the solr

Re: change solr core schema and config via http

2013-06-28 Thread Rafał Kuć
Hello! In 4.3.1 you can only read schema.xml or portions of it using Schema API (https://issues.apache.org/jira/browse/SOLR-4658). It is a start to allow schema.xml modifications using HTTP API, which will be a functionality of next release of Solr -

An issue with atomic updates?

2013-06-28 Thread Sam Antique
Hi all, I think I have found an issue (or misleading behavior, per say) about atomic updates. If I do atomic updates on a field, and if the operation is none-sense (anything other than add, set, inc), it still returns success. Say I send: /update/json?commit=true -d '[{id:...,

Re: An issue with atomic updates?

2013-06-28 Thread Jack Krupansky
Well, it is known to me and documented in my book. BTW, that field value is simply ignored. There are tons of places in Solr where undefined values or outright garbage are simply ignored, silently. Go ahead and file a Jira though. -- Jack Krupansky -Original Message- From: Sam

RE: change solr core schema and config via http

2013-06-28 Thread Wu, James C.
Hi, It only allow adding new fields to the existing schema. My problem is that I am trying to provide my own schema file when I create a new core and I do not have ssh access to the solr host. Is this not even possible? Regards, james -Original Message- From: Rafał Kuć

Re: change solr core schema and config via http

2013-06-28 Thread Jack Krupansky
How could you not have ssh access to the Solr host machine? I mean, how are you managing that server, without ssh access? And if you are not managing the server, what business do you have trying to change the Solr configuration?!?!? Something fishy here! -- Jack Krupansky -Original

RE: change solr core schema and config via http

2013-06-28 Thread Wu, James C.
Hi, Well, we try to use Solr to run a multi-tenant index/search service. We assigns each client a different core with their own config and schema. It would be good for us if we can just let the customer to be able to create cores with their own schema and config. The customer would

documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Hey guys, This has to be a stupid question/I must be doing something wrong, but after frequent load testing with documentCache enabled under Solr 4.3.1 with autoWarmCount=150, I'm noticing that my documentCache metrics are always zero for non-cumlative. At first I thought my commit rate is fast

Re: change solr core schema and config via http

2013-06-28 Thread Jack Krupansky
Ah, yes, good old multi-tenant - I should have known. Yeah, the Solr API is evolving, albeit too slowly for the needs of some. -- Jack Krupansky -Original Message- From: Wu, James C. Sent: Friday, June 28, 2013 7:06 PM To: solr-user@lucene.apache.org Subject: RE: change solr core

Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
To answer some of my own question, Shawn H's great reply on this thread explains why I see no autoWarming on doc cache: http://www.marshut.com/iznwr/soft-commit-and-document-cache.html It is still unclear to me why I see no other metrics, however. Thanks Shawn, Tim On 28 June 2013 16:14, Tim

Re: documentCache not used in 4.3.1?

2013-06-28 Thread Otis Gospodnetic
Hi Tim, Not sure about the zeros in 4.3.1, but in SPM we see all these numbers are non-0, though I haven't had the chance to confirm with Solr 4.3.1. Note that you can't really autowarm document cache... Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring --

Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Thanks Otis, Yeah I realized after sending my e-mail that doc cache does not warm, however I'm still lost on why there are no other metrics. Thanks! Tim On 28 June 2013 16:22, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Tim, Not sure about the zeros in 4.3.1, but in SPM we see

Re: Replicating files containing external file fields

2013-06-28 Thread Arun Rangarajan
Jack, Here is the ReplicationHandler definition from solrconfig.xml: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfterstartup/str str name=replicateAftercommit/str str

Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-28 Thread Bill Au
I am running Solr 4.3.0, using DIH to import data from MySQL. I am running into a very strange problem where data from a datetime column being imported with the right date but the time is 00:00:00. I tried using SQL DATE_FORMAT() and also DIH DateFormatTransformer but nothing works. The raw

Re: Replicating files containing external file fields

2013-06-28 Thread Jack Krupansky
Yes, you need to list that EFF file in the confFiles list - only those listed files will be replicated. str name=confFilessolrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml, /var/solr-data/List/external_*/str Oops... sorry, no wildcards... you must list the

Re: Joins with SolrCloud

2013-06-28 Thread Chris Toomey
Thanks, confirmed by trying w/ 4.3.1 that the join works with the outer collection distributed/sharded so long as the inner collection is not distributed/sharded. Chris On Tue, Jun 25, 2013 at 4:55 PM, Upayavira u...@odoko.co.uk wrote: I have never heard mention that joins support distributed

FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-06-28 Thread Mike L.
  I've been working on improving index time with a JdbcDataSource DIH based config and found it not to be as performant as I'd hoped for, for various reasons, not specifically due to solr. With that said, I decided to switch gears a bit and test out FileDataSource setup... I assumed by

Improving performance to return 2000+ documents

2013-06-28 Thread Utkarsh Sengar
Hello, I have a usecase where I need to retrive top 2000 documents matching a query. What are the parameters (in query, solrconfig, schema) I shoud look at to improve this? I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB RAM, 8vCPU and 7GB JVM heap size. I have

Re: Joins with SolrCloud

2013-06-28 Thread Yonik Seeley
On Tue, Jun 25, 2013 at 7:55 PM, Upayavira u...@odoko.co.uk wrote: However, if from your example, innerCollection was replicated across all nodes, I would think that should work, because all that comes back from each server when a distributed search happens is the best 'n' matches, so exactly

Re: Improving performance to return 2000+ documents

2013-06-28 Thread Utkarsh Sengar
Also, I don't see a consistent response time from solr, I ran ab again and I get this: ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201rows=2000wt=json Benchmarking x.amazonaws.com (be patient) Completed 100

Re: cores sharing an instance

2013-06-28 Thread Shalin Shekhar Mangar
There is very little shared between multiple cores (instanceDir paths, logging config maybe?). Why are you trying to do this? On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin pey...@robustlinks.com wrote: Hi I have a multicore setup (in 4.3.0). Is it possible for one core to share an instance

Re: dataconfig to index ZIP Files

2013-06-28 Thread Shalin Shekhar Mangar
What is dataSource=binaryFile? I don't see any such data source defined in your configuration. On Fri, Jun 28, 2013 at 11:50 PM, ericrs22 ericr...@yahoo.com wrote: So I thought I had it correctly setup but I'm receiveing the following response to my Data Config Last Update: 18:17:52

Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-28 Thread Shalin Shekhar Mangar
The default in JdbcDataSource is to use ResultSet.getObject which returns the underlying database's type. The type specific methods in ResultSet are not invoked unless you are using convertType=true. Is MySQL actually returning java.sql.Timestamp objects? On Sat, Jun 29, 2013 at 5:22 AM, Bill Au