Re: DateDiff

2013-11-06 Thread Aloke Ghoshal
Hi Adam, With FunctionQuery (http://wiki.apache.org/solr/FunctionQuery) DateMath ( http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/DateMathParser.html) - round to the day level, subtract divide by milliseconds_in_a_day (86400K).

Re: Facet question: Getting only the matched value from multivalued field

2013-11-03 Thread Aloke Ghoshal
Hi Susheel, You might be able to pull something off using facet.prefix: http://wiki.apache.org/solr/SimpleFacetParameters#facet.prefix. Will work when the prefix is exact and doesn't require any analysis, something along these lines:

Re: Solr subset searching in 100-million document index

2013-10-25 Thread Aloke Ghoshal
Hi Sandeep, You are quite likely below capacity with this current set-up: http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache Few things for you to confirm: 1. Which version of Solr are you using? 2. The size of your index. - Are fields stored? How much are these stored fields

Re: Changing indexed property on a field from false to true

2013-10-24 Thread Aloke Ghoshal
Upayavira - Nice idea pushing in a nominal update when all fields are stored, and it does work. The nominal update could be sent to a boolean type dynamic field, that's not to be used for anything other than maybe identifying documents that are done re-indexing. On Wed, Oct 23, 2013 at 7:47 PM,

Re: Find documents that are composed of % words

2013-10-16 Thread Aloke Ghoshal
: Aloke Ghoshal i'm trying to work out your equation. i am using standard scheme provided by nutch for solr and not aware of how to calculate myfieldwordcount in first query.no idea where this count will come from. is there any filter that will store number of tokens generated

Re: Find documents that are composed of % words

2013-10-10 Thread Aloke Ghoshal
Something you could do via function queries. Performance (for 500+ words) is a doubtful. 1) With a separate float field (myfieldwordcount) that holds the count of words from your query field (myfield): http://localhost:8983/solr/collection1/select?wt=xmlindent=truedefType=func fl=id,myfield

Re: dynamic field question

2013-10-09 Thread Aloke Ghoshal
Hi David, A separate Solr document for each section is a good option if you also need to handle phrases, case, special characters, etc. within the title field. How do you map them to dynamic fields? E.g.: Appendix for cities, APPENDIX 1: Cities Regards, Aloke On Wed, Oct 9, 2013 at 9:45 AM,

Re: Find documents that are composed of % words

2013-10-09 Thread Aloke Ghoshal
Hi Shahzad, Have you tried with the Minimum Should Match feature: http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29 Regards, Aloke On Wed, Oct 9, 2013 at 4:55 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You can take your words, combine some % of

Re: Will Solr work with a mapped drive?

2013-09-20 Thread Aloke Ghoshal
Hi, Try the UNC path instead: http://wiki.apache.org/tomcat/FAQ/Windows#Q6 Regards, Aloke On 9/20/13, johnmu...@aol.com johnmu...@aol.com wrote: Hi, I'm having this same problem as described here:

Re: ReplicationFactor for solrcloud

2013-09-12 Thread Aloke Ghoshal
Hi Aditya, You need to start another 6 instances (9 instances in total) to achieve this. The first 3 instances, as you mention, are already assigned to the 3 shards. The next 3 will be become their replicas, followed by the next 3 as the next replicas. You could create two copies each of the

Re: Some highlighted snippets aren't being returned

2013-09-09 Thread Aloke Ghoshal
Hi Eric, As Bryan suggests, you should look at appropriately setting up the fragSize maxAnalyzedChars for long documents. One issue I find with your search request is that in trying to highlight across three separate fields, you have added each of them as a separate request param:

Re: Order of fields in a search query.

2013-08-31 Thread Aloke Ghoshal
Hi Deepak, As Hoss explains it, there wouldn't be any effect of changing the order of individual search terms. In addition, you could look at the Scoring algo: http://lucene.apache.org/core/2_9_4/scoring.html#Algorithm,

Re: Newbie SOLR question

2013-08-30 Thread Aloke Ghoshal
Hi, Please refer to my response from a few months back: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3ccaht6s2az_w2av04rdmoeeck5e9o0k4ytktf0pjsecsh-lls...@mail.gmail.com%3E Our modelling is to index N (individual pages) + 1 (original document) in Solr. Once a document

Re: Problem with importing tab-delimited csv file

2013-08-23 Thread Aloke Ghoshal
Hi Rob, I think the wrong Content-type header is getting passed. Try one of these instead: curl ' http://localhost:8983/solr/update/csv?commit=trueseparator=%09stream.file=/tmp/sample.tmp ' OR curl 'http://localhost:8983/solr/update/csv?commit=trueseparator=%09' -H

Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal
Hi, Facet by one of the duplicate fields (probably by the numeric field that you mentioned) and set facet.mincount=2. Regards, Aloke On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib docbook@gmail.com wrote: hello, We have documents that are duplicates i.e. the ID is different, but rest of

Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal
PM, Aloke Ghoshal alghos...@gmail.com wrote: Hi, Facet by one of the duplicate fields (probably by the numeric field that you mentioned) and set facet.mincount=2. Regards, Aloke On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib docbook@gmail.com wrote: hello, We have

Re: convert text file to solr document where delimiter fields are fields of document

2013-08-20 Thread Aloke Ghoshal
Hi, Since your data is well delimited, I'd suggest using CSV Updater, with the delimiter/ separator set to: *'~*' See: http://wiki.apache.org/solr/UpdateCSV#separator Looks like you might also have to additionally split based on your second delimiter: *';'* See:

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal
Hi Vicky, Please check you if you have a second multiValued field by the name content defined in your schema.xml. It is typically part of the default schema definition is different from the one you had initially posted had Content with a capital C. Here's the debugQuery on my system (with both

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal
Here you go, it is the default 4.2.1 schema.xml ( http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/solr.xml), with the following additions: !-- Added these fields -- field name=Content type=text_general indexed=true stored=true multiValued=false/ field

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal
Location of the schema.xml: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/collection1/conf/schema.xml On Mon, Aug 19, 2013 at 6:52 PM, Aloke Ghoshal alghos...@gmail.com wrote: Here you go, it is the default 4.2.1 schema.xml ( http://svn.apache.org/repos

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-16 Thread Aloke Ghoshal
Hi, Based on your WhitespaceTokenizerFactory due to the LowerCaseFilterFactory the words actually indexed are: speed, post, speedpost You should get results for: q:Content:speedpost So either remove the LowerCaseFilterFactory or add the LowerCaseFilterFactory to as a query time Analyzer as

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-16 Thread Aloke Ghoshal
Hi, That's correct the Analyzers will get applied to both Index Query time. In fact I do get results back for speedPost with this field definition. Regards, Aloke On Fri, Aug 16, 2013 at 5:21 PM, vicky desai vicky.de...@germinait.comwrote: Hi, Another Example I found is q=Content:wi-fi

Re: Load a list of values in a solr field and query over its items

2013-08-14 Thread Aloke Ghoshal
Should work once you set up both fields as multiValued ( http://wiki.apache.org/solr/SchemaXml#Common_field_options). On Thu, Aug 15, 2013 at 12:07 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Hello, Is it possible to load a list in a solr filed and query for items in that list?

Re: SOLR OR query, want 1 of the 2 results

2013-08-12 Thread Aloke Ghoshal
Hi, I would suggest boosting over sorting. Something along: radius:[0 TO 10]^100 OR radius:[10 TO *] Regards, Aloke On Mon, Aug 12, 2013 at 6:43 PM, Raymond Wiker rwi...@gmail.com wrote: It will probably have better performance than having a plan b query that executes if the first query

Re: Solr search on a large text field is very slow

2013-08-08 Thread Aloke Ghoshal
Compare timings in the following cases: - Without the wildcard - With suffix wild card only - test* - With reverse wild card filter factory and two separate terms - *test OR test* On Thu, Aug 8, 2013 at 8:15 PM, meena.sri...@mathworks.com meena.sri...@mathworks.com wrote: Index size is around

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Aloke Ghoshal
Does adding facet.mincount=2 help? On Tue, Jul 30, 2013 at 11:46 PM, Dotan Cohen dotanco...@gmail.com wrote: To search for duplicate IDs, I am running the following query: select?q=*:*facet=truefacet.field=idrows=0 However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving

Re: Switch to new leader transparently?

2013-07-10 Thread Aloke Ghoshal
Hi Floyd, We use SolrNet to connect to Solr from a C# application. Since SolrNet is not aware about SolrCloud or ZK, we use a Http load balancer in front of the Solr nodes query via the load balancer url. You could use something like HAProxy or Apache reverse proxy for load balancing. On the

Re: Is it possible to find a leader from a list of cores in solr via java code

2013-07-03 Thread Aloke Ghoshal
One option could be to get the clusterstate.json via the following Solr url figure out the leader from the response json: * http://server:port/solr/zookeeper?detail=truepath=%2Fclusterstate.json* On Wed, Jul 3, 2013 at 5:57 PM, vicky desai vicky.de...@germinait.comwrote: Hi, I have a

Re: Solr Suggest does not work in solrcloud environment

2013-06-21 Thread Aloke Ghoshal
Hi Simon, Good that it works. The reason as far as I could make out is that by itself/ standalone the SpellCheckComponent (used by the suggester) is not distributed. One way to explicitly distribute the search is to provide the shards:

Re: Solr Suggest does not work in solrcloud environment

2013-06-19 Thread Aloke Ghoshal
Hi, Check the obvious first, that you have rebuilt reloaded the suggest dictionary individually on all nodes. Also the other checks here: http://stackoverflow.com/questions/6653186/solr-suggester-not-returning-any-results Then, try with one of query component OR distrib=false setting:

Re: Filtering down terms in suggest

2013-06-13 Thread Aloke Ghoshal
Thanks Barani. Could also work out this way provided we start with a large set of suggestions initially to increase the likelihood of getting some matches when filtering down with the second query. On Wed, Jun 12, 2013 at 10:51 PM, bbarani bbar...@gmail.com wrote: I would suggest you to take

Re: Filtering down terms in suggest

2013-06-12 Thread Aloke Ghoshal
to the right field. Obviously this will get out of hand if you have too many of these...so this has limits. Jason On Jun 11, 2013, at 8:29 AM, Aloke Ghoshal alghos...@gmail.com wrote: Hi, Trying to find a way to filter down the suggested terms set based on the term value of another

Re: Filtering down terms in suggest

2013-06-12 Thread Aloke Ghoshal
(EdgeNGram) behavior to get the right suggestion data back. I would suggest an additional core to accomplish this (fed via replication) to avoid cache entry collision with your normal queries. Hope that's useful to you. Jason On Jun 12, 2013, at 7:43 AM, Aloke Ghoshal alghos...@gmail.com wrote

Filtering down terms in suggest

2013-06-11 Thread Aloke Ghoshal
Hi, Trying to find a way to filter down the suggested terms set based on the term value of another indexed field? Let's say we have the following documents indexed in Solr: userid:1, groupid:1, content:alpha beta gamma userid:2, groupid:1, content:alternate better garden userid:3, groupid:2,

Re: LIMIT on number of OR in fq

2013-06-10 Thread Aloke Ghoshal
True, the container's request header size limit must be the reason then. Try: http://serverfault.com/questions/136249/how-do-we-increase-the-maximum-allowed-http-get-query-length-in-jetty On Sun, Jun 9, 2013 at 11:04 PM, Jack Krupansky j...@basetechnology.comwrote: Maybe it is hitting some

Re: LIMIT on number of OR in fq

2013-06-09 Thread Aloke Ghoshal
Hi Kamal, You might have to increase the value of maxBooleanClauses in solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml). The default value 1024 should have been fine for 280 search terms. Though not relevant to your query (OR query) take a look at for an explanation:

Re: SQL MINUS equivalent in solr

2013-06-02 Thread Aloke Ghoshal
Hi, A work around could be to add columns from the second table as fields to the Solr document from the first table. E.g. For DB query: SELECT project_id FROM projects MINUS SELECT project_id FROM archived_project; Add archived_projects as a boolean field to Projects in Solr then query as:

Re: Get page number of searchresult of a pdf in solr

2013-03-01 Thread Aloke Ghoshal
Hi, We are going about solving this problem by splitting a N-page document in to N separate documents (one per page, type=Page) + 1 additional combined document (that has all the pages, type=Combined). All the N+1 documents have the same doc_id. The search is initially performed against the

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Aloke Ghoshal
Hi, If you haven't already, please refer to: http://www.ngdata.com/site/blog/57-ng.html http://lucene.472066.n3.nabble.com/solr-cloud-concepts-td3726292.html http://wiki.apache.org/solr/SolrCloud#FAQ Regards, Aloke On Thu, Jan 3, 2013 at 3:12 PM, Alexandre Rafalovitch arafa...@gmail.comwrote:

Re: ZooKeeper ensemble behind load balancer

2012-12-30 Thread Aloke Ghoshal
Hi Marcin, Since you are thinking of this in the context of Amazon, I would suggest taking a different route. Assign an Elastic IP (EIP) to each EC2 instance running the ZK node use the EIP in Solr. This way you could easily map the EIP to a new EC2 instance subsequently, if required, and the

Re: score calculation

2012-12-13 Thread Aloke Ghoshal
Hi Tom, This is great. Should make it to the documentations. Regards, Aloke On Thu, Dec 13, 2012 at 1:23 PM, Burgmans, Tom tom.burgm...@wolterskluwer.com wrote: I am also busy with getting this clear. Here are my notes so far (by copying and writing myself): queryWeight = the impact

Re: star searches with high page number requests taking long times

2012-12-07 Thread Aloke Ghoshal
Hi Robert, You could look at pageDoc pageScore to improve things for deep paging ( http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore). Regards, Aloke On Sat, Dec 8, 2012 at 8:08 AM, Upayavira u...@odoko.co.uk wrote: Yes, expected. When it does a search for the first,

Running Solr Core/ Tika on Azure

2012-10-30 Thread Aloke Ghoshal
Hi, Looking for feedback on running Solr Core/ Tika parsing engine on Azure. There's one offering for Solr within Azure from Lucid works. This offering however doesn't mention Tika. We are looking at options to make content from files (doc, excel, pdfs, etc.) stored within Azure storage

Re: Running Solr Core/ Tika on Azure

2012-10-30 Thread Aloke Ghoshal
project boostraps itself with all of the Java and Solr files it needs to run and starts Solr using bundled in Jetty web server, so as long as you have Tika in your libs and a configured handler you should be able to use it. Radek. On Tue, Oct 30, 2012 at 4:31 AM, Aloke Ghoshal alghos...@gmail.com