Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

2012-09-06 Thread guenter.hip...@unibas.ch
Hoss, I'm so happy you realized the problem because I was quite worried about it!! Let me know if I can provide support with testing it. The last two days I was busy with migrating a bunch of hosts which should -hopefully- be finished today. Then I have again the infrastructure for running

RE: Delete all documents in the index

2012-09-06 Thread Alexey Kozhemiakin
One more thanks for posting this! I struggled with the same issue yesterday and solved it with _version_ hint from mailing list . Alex. -Original Message- From: Mark Mandel [mailto:mark.man...@gmail.com] Sent: Thursday, September 06, 2012 1:53 AM To: solr-user@lucene.apache.org

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-09-06 Thread aniljayanti
Hi, Thanks, Iam getting the results with below url. *suggest/?q=michael bdf=titledefType=lucenefl=title* But, i want the results in spellcheck section. i want to search with title or empname or both. Aniljayanti -- View this message in context:

terms component search

2012-09-06 Thread Peter Kirk
Hi I am trying to implement some auto suggest functionality, and am currently looking at the terms component (Solr 3.6). For example, I can form a query like this: http://solrhost/solr/mycore/terms?terms.fl=title_sterms.sort=indexterms.limit=5terms.prefix=Hotel+C which searches in the title_s

Re: Document Processing

2012-09-06 Thread Tanguy Moal
If your interest is focusing on the real textual content of a web page, you could try this : JReadability (https://github.com/ifesdjeen/jReadability , Apache 2.0 license), which wraps JSoup (as Lance suggested) and applies a set of predefined rules to scrap crap (nav, headers, footers, ...) off of

Re: terms component search

2012-09-06 Thread Tanguy Moal
Hi Peter, Yes if you want to do complex things in suggest mode, you'd better rely on the SearchComponent... For example, this blog post is a good read http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ , if you have complex requirements on the searched fields. (Although

Re: solr indexing slows down after few minutes

2012-09-06 Thread amit
Commit is not too often, it's a batch of 100 records, takes 40 to 60 secs before another commit. No I am not indexing with multi threads. It uses a single thread executor. I have seen steady performance for now after increasing the merge factor from 10 to 25. Will have to wait and watch if that

solr 3.6.1 tomcat 7.0 missing core name in path

2012-09-06 Thread amit
Hi I have installed solr 3.6.1 on tomcat 7.0 following the steps here. http://ralf.schaeftlein.de/2012/02/10/installing-solr-3-5-under-tomcat-7/ The slor home page loads fine but the admin page (http://localhost:8080/solr/admin/) throws error missing core name in path. I am installing single

Facetting inside a custom component

2012-09-06 Thread Ralf Heyde
Hello, i'm currently devoloping a custom component in Solr. This component works fine. The problem I have is, I only have an access to the searcher which gives me the option to fire e.g. BooleanQueries. This searcher gives me a result, which I have to iterate to calculate informations which

Re: Facetting inside a custom component

2012-09-06 Thread Ralf Heyde
Hi, just found a solution, but you have to know, what you want to count: try { final SolrIndexSearcher s = rb.req.getSearcher(); final SolrQueryParser qp = new SolrQueryParser(rb.req.getSchema(), null); final String queryString = entity_type:RELEASE; final Query q = qp.parse(queryString);

Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Alexandre Rafalovitch
Hello, I was under the impression that edismax was supposed to be crash proof and just ignore bad syntax. But I am either misconfiguring it or hit a weird bug. I basically searched for text containing '/' and got this: { 'responseHeader'={ 'status'=400, 'QTime'=9, 'params'={

RE: Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Yoni Amir
As far as I understand, / is a special character and needs to be escaped. Maybe foo\/bar should work? I found this when I looked at the code of ClientUtils.escapeQueryChars: // These characters are part of the query syntax and must be escaped if (c == '\\' || c == '+' || c == '-' || c ==

Re: Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Yonik Seeley
I believe this is caused by the regex support in https://issues.apache.org/jira/browse/LUCENE-2039 It certainly seems wrong to interpret a slash in the middle of the word as the start of a regex, so I've reopened the issue. -Yonik http://lucidworks.com On Thu, Sep 6, 2012 at 9:34 AM, Alexandre

AW: Website (crawler for) indexing

2012-09-06 Thread Lochschmied, Alexander
Thanks Rafał and Markus for your comments. I think Droids it has serious problem with URL parameters in current version (0.2.0) from Maven central: https://issues.apache.org/jira/browse/DROIDS-144 I knew about Nutch, but I haven't been able to implement a crawler with it. Have you done that or

RE: deletedPkQuery not work in solr 3.3

2012-09-06 Thread Dyer, James
You have deletedPKQuery, but the correct spelling is deletedPkQuery (lowercase k). Try that and see if it fixes your problem. Also, you can probably simplify this if you do this as command=full-importclean=false, then use something like this for your query: select product_id as

Re: Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Jack Krupansky
That's what I was thinking, but when I tried foo/bar in Solr 3.6 and 4.0-BETA it was working fine - it split the term and generated the proper query without any error. I think the problem is if you use the default Lucene query parser, not edismax. I removed defType==edismax from my query

Re: Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Alexandre Rafalovitch
I am on 4.0 alpha. Maybe it was fixed in beta. But I am most definitely seeing this in edismax. If I get rid of / and use debugQuery, I get: 'responseHeader'={ 'status'=0, 'QTime'=14, 'params'={ 'debugQuery'='true', 'indent'='true', 'q'='foobar', 'qf'='TitleEN

Re: AW: Website (crawler for) indexing

2012-09-06 Thread Rafał Kuć
Hello! I think that really depends on what you want to achieve and what parts of your current system you would like to reuse. If it is only HTML processing I would let Nutch and Solr do that. Of course you can extend Nutch (it has a plugin API) and implement the custom logic you need as a Nutch

Re: Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Jack Krupansky
I do in fact see your problem with an earlier 4.0 build, but not with 4.0-BETA. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Thursday, September 06, 2012 10:13 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4.0alpha: edismax complaints on certain

RE: Website (crawler for) indexing

2012-09-06 Thread Markus Jelsma
-Original message- From:Lochschmied, Alexander alexander.lochschm...@vishay.com Sent: Thu 06-Sep-2012 16:04 To: solr-user@lucene.apache.org Subject: AW: Website (crawler for) indexing Thanks Rafał and Markus for your comments. I think Droids it has serious problem with URL

Re: Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Jack Krupansky
The fix in edismax was made just a few days (6/28) before the formal announcement of 4.0-ALPHA (7/3), but unfortunately the fix came a few days after the cutoff for 4.0-ALPHA (6/25). See: https://issues.apache.org/jira/browse/SOLR-3467 (That issue should probably be annotated to indicate that

Re: Problem with verifying signature ?

2012-09-06 Thread Chris Hostetter
: gpg: Signature made 08/06/12 19:52:21 Pacific Daylight Time using RSA key : ID 322 : D7ECA : gpg: Good signature from Robert Muir (Code Signing Key) rm...@apache.org : *gpg: WARNING: This key is not certified with a trusted signature!* : gpg: There is no indication that the signature

Re: Solr not allowing persistent HTTP connections

2012-09-06 Thread Chris Hostetter
: Some extra information. If I use curl and force it to use HTTP 1.0, it is more : visible that Solr doesn't allow persistent connections: a) solr has nothing to do with it, it's entirely something under the control of jetty the client. b) i think you are introducing confusion by trying to

Re: Solr not allowing persistent HTTP connections

2012-09-06 Thread Aleksey Vorona
Thank you. I did the test with curl the same way you did it and it works. I still can not get ab (apache benchmark) to reuse connections to solr. I'll investigate this further. $ ab -c 1 -n 100 -k 'http://localhost:8983/solr/select?q=*:*' | grep Alive Keep-Alive requests:0 -- Aleksey On

Solr-Export

2012-09-06 Thread Helton Alponti
Hey Guys, I created a program to export Solr index data to XML. The url is https://github.com/eltu/Solr-Export Tell me about any problem, please. *** I only tested with the Solr 3.6.1 Thanks, Helton

Solr search not working after copying a new field to an existing Indexed Field

2012-09-06 Thread Mani
I have a made a schema change to copy an existing field name (Source Field) to an existing search field text (Destination Field). Since I made the schema change, I updated all the documents thinking the new source field will be clubbed together with the text field. The search for a specific

NoHttpResponseException: The server failed to respond

2012-09-06 Thread srinir
We have a distributed solr setup with 8 servers and 8 cores on each server in production. We see this error multiple times in our solr servers. we are using solr 3.6.1. Has anyone seen this error before and have you resolved it ? 2012-09-04 02:16:40,995 [http-nio-8080-exec-7] ERROR

Re: UnInvertedField limitations

2012-09-06 Thread Fuad Efendi
Hi Jack, 24bit = 16M possibilities, it's clear; just to confirm... the rest is unclear, why 4-byte can have 4 million cardinality? I thought it is 4 billions... And, just to confirm: UnInvertedField allows 16M cardinality, correct? On 12-08-20 6:51 PM, Jack Krupansky

Re: UnInvertedField limitations

2012-09-06 Thread Fuad Efendi
Hi Lance, Use case is keyword extraction, and it could be 2- and 3-grams (2- and 3- words); so that theoretically we can have 10,000^3 = 1,000,000,000,000 3-grams for English only... of course my suggestion is to use statistics and to build a dictionary of such 3-word combinations (remove top,

Re: UnInvertedField limitations

2012-09-06 Thread Yonik Seeley
It's actually limited to 24 bits to point to the term list in a byte[], but there are 256 different arrays, so the maximum capacity is 4B bytes of un-inverted terms, but each bucket is limited to 4B/256 so the real limit can come in at a little less due to luck. From the comments: * There is

Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-06 Thread kiran chitturi
Hi, I am using Solr with DIH and started getting errors when the database time/date fields are getting imported in to Solr. I have used the date as the field type but when i looked up at the docs it looks like the date field does not accept (Thu, 06 Sep 2012 22:32:33 +) or (1346976590)

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-06 Thread Chris Hostetter
: I am using Solr with DIH and started getting errors when the database : time/date fields are getting imported in to Solr. I have used the date as what actual error are you getting? If you are pulling dates from a SQL Date field, that the jdbc driver returns as java.util.Date objects, then

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-06 Thread Hasan Diwan
http://www.electrictoolbox.com/article/mysql/format-date-time-mysql/ hth -- H On 6 Sep 2012 17:23, kiran chitturi chitturikira...@gmail.com wrote: Hi, I am using Solr with DIH and started getting errors when the database time/date fields are getting imported in to Solr. I have used the date

Re: solr issue with seaching words

2012-09-06 Thread Chris Hostetter
: I am facing a strange problem. I am searching for word jacke but solr also : returns result where my description contains 'RCA-Jack/'. Íf i search : jacka or jackc or jackd, it works fine and does not return me any : result which is what i am expecting in this case. you need to tell us what

Re: EdgeNgramTokenFilter and positions

2012-09-06 Thread Otis Gospodnetic
I don't know for sure, but I remember something around this being a problem, yes ... maybe https://issues.apache.org/jira/browse/LUCENE-3907 ? Otis  Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  - Original Message - From: Walter Underwood

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-06 Thread kiran chitturi
Hi, Thank you for your response. The error i am getting is 'org.apache.solr.common.SolrException: Invalid Date String: '1345743552'. I think it was being saved as a string in DB, so i will use the DateFormatTransformer. When i index a text field which has arabic and English like this tweet

solrcloud setup using tomcat, single machine

2012-09-06 Thread JesseBuesking
Hey guys! I've been attempting to get solrcloud set up on a ubuntu vm, but I believe I'm stuck. I've got tomcat setup, the solr war file in place, and when I browser to localhost:port/solr, I can see solr. CHECK I've set the zoo.cfg to use port 5200. I can start it up and see it's running (ls

Re: EdgeNgramTokenFilter and positions

2012-09-06 Thread Walter Underwood
Yes, that is exactly the bug. EdgeNgram should work like the synonym filter. wunder On Sep 6, 2012, at 5:51 PM, Otis Gospodnetic wrote: I don't know for sure, but I remember something around this being a problem, yes ... maybe https://issues.apache.org/jira/browse/LUCENE-3907 ? Otis

Solr request/response lifecycle and logging full response time

2012-09-06 Thread Aaron Daubman
Greetings, I'm looking to add some additional logging to a solr 3.6.0 setup to allow us to determine actual time spent by Solr responding to a request. We have a custom QueryComponent that sometimes returns 1+ MB of data and while QTime is always on the order of ~100ms, the response time at the

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-06 Thread Gora Mohanty
On 7 September 2012 06:24, kiran chitturi chitturikira...@gmail.com wrote: [...] When i index a text field which has arabic and English like this tweet “@anaga3an: هو سعد الحريري بيعمل ايه غير تحديد الدوجلاس ويختار الكرافته ؟؟” #gcc #ksa #lebanon #syria #kuwait #egypt #سوريا with field_type

Re: Solr request/response lifecycle and logging full response time

2012-09-06 Thread Aaron Daubman
I'd still love to see a query lifecycle flowchart, but, in case it helps any future users or in case this is still incorrect, here's how I'm tackling this: 1) Override default json responseWriter with my own in solrconfig.xml: queryResponseWriter name=json

Re: Importing of unix date format from mysql database and dates of format 'Thu, 06 Sep 2012 22:32:33 +0000' in Solr 4.0

2012-09-06 Thread Lance Norskog
Also, your browser may use a platform default for the encoding instead of UTF-8. Some MacOS and Windows browsers have this problem. Tomcat sometimes needs adjustment to use UTF-8. If you are on tomcat, check this: http://find.searchhub.org/link?url=http://wiki.apache.org/solr/SolrTomcat

Re: Doubts in Result Grouping in solr 3.6.1

2012-09-06 Thread Erick Erickson
Grouping isn't defined for tokenized fields I don't think. See: http://wiki.apache.org/solr/FieldCollapsing where it says for group.field: ..The field must currently be single-valued... Are you sure you don't want faceting? Best Erick On Tue, Sep 4, 2012 at 5:27 AM, mechravi25

Re: How to preserve source column names in multivalue catch all field

2012-09-06 Thread Erick Erickson
Try using edismax to distribute the search across the fields rather than using the catch-all field. There's no way that I know of to reconstruct what field the source was. But storing the source fields without indexing them is OK too, it won't affect searching speed noticeably... Best Erick On

Re: Best practices on managing facets with Code and Name

2012-09-06 Thread Erick Erickson
I don't know of any better way to do this. Conflating the fields is not _that_ error prone, although it is annoying I agree. I think that idea is better than storing them separately. Best Erick On Tue, Sep 4, 2012 at 4:58 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, I have some

Re: Sorting on mutivalued fields still impossible?

2012-09-06 Thread Erick Erickson
And you've illustrated my viewpoint I think by saying two obvious choices. I may prefer the first, and you may prefer the second. Neither is necessarily more correct IMO, it depends on the problem space. Choosing either one will be unpopular with anyone who likes the other And I suspect that

Re: SOLR 4.0 / Jetty Security Set Up

2012-09-06 Thread Erick Erickson
Securing Solr pretty much universally requires that you only allow trusted clients to access the machines directly, usually secured with a firewall and allowed IP addresses, the admin handler is the least of your worries. Consider if you let me ping solr directly, I can do something really

Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

2012-09-06 Thread Erick Erickson
Guenter: Are you using SolrCloud or straight Solr? And were you updating in batches (i.e. updating multiple docs at once from SolrJ by using the server.add(doclist) form)? There was a bug in this process that caused various docs to show up in various shards differently. This has been fixed in