Recovering from database connection resets in DataimportHandler

2012-02-10 Thread Mike O'Leary
I am trying to use Solr's DataImportHandler to index a large number of database records in a SQL Server database that is owned and managed by a group we are collaborating with. The indexing jobs I have run so far, except for the initial very small test runs, have failed due to database connectio

Joining multicore to return top results

2012-02-10 Thread Selvam
Hi, This should be trivial question, still I am failing to get the details.I have 2 cores+default collection, *collection1:* article_id title content *core0:* cluster_id cluster_name cluster_count *core1:* article_id article_cluster_id score Given an article_id, I want to return top 10 ( based

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
hmmperhaps I'm seeing the issue you're speaking of. I have everything running right now and my state is as follows: {"collection1":{ "slice1":{ "JamiesMac.local:8501_solr_slice1_shard1":{ "shard_id":"slice1", "leader":"true", "state":"active", "core":

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > jamiesmac Another note: Have no idea if this is involved, but when I do tests with my linux box and mac I run into the following: My linux box auto finds the address of halfmetal and my macbook mbpro.local. If I accept those defaults, my ma

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Thanks. If the given ZK snapshot was the end state, then two nodes are marked as down. Generally that happens because replication failed - if you have not, I'd check the logs for those two nodes. - Mark On Fri, Feb 10, 2012 at 7:35 PM, Jamie Johnson wrote: > nothing seems that different. In r

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
nothing seems that different. In regards to the states of each I'll try to verify tonight. This was using a version I pulled from SVN trunk yesterday morning On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller wrote: > Also, it will help if you can mention the exact version of solrcloud you are > tal

Setting up logging for a Solr project that isn't in tomcat/webapps/solr

2012-02-10 Thread Mike O'Leary
I set up a Solr project to run with Tomcat for indexing contents of a database by following a web tutorial that described how to put the project directory anywhere you want and then put a file called .xml in the tomcat/conf/Catalina/localhost directory that contains contents like this: I

RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
Thanks, that explains why the individual terms 'chicken' and 'stock' are still in the query (and are required). So I have tried a few things to get around this, but to no avail: Changed the query analyzer to use the WhitespaceTokenizerFactory with autoGeneratePhraseQueries=true. This creates the

Re: indexing with DIH (and with problems)

2012-02-10 Thread alessio crisantemi
Here is a stack: SEVERE: Full Import failed org.apache.solr.handler. dataimport.DataImportHandlerException: Unable to load En tityProcessor implementation for entity:9946435225838 Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocB uilder.java:576) .

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Also, it will help if you can mention the exact version of solrcloud you are talking about in each issue - I know you have one from the old branch, and I assume a version off trunk you are playing with - so a heads up on which and if trunk, what rev or day will help in the case that I'm trying t

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
I'm trying, but so far I don't see anything. I'll have to try and mimic your setup closer it seems. I tried starting up 6 solr instances on different ports as 2 shards, each with a replication factor of 3. Then I indexed 20k documents to the cluster and verified doc counts. Then I shutdown all

Re: URI Encoding with Solr and Weblogic

2012-02-10 Thread rzoao
Hello, Elisabeth I am having the same issue with WebLogic 11 with Solr 3.5. I've tried your solution and didn't work out, but I'm not sure if I'm doing it right. I've tried to alter the %SERVER_HOME%\servers\AdminServer\tmp\_WL_user\solr\t6nzak\war\WEB-INF\weblogic.xml and restarted the server, b

new feature: advanced filter caching and post filtering

2012-02-10 Thread Yonik Seeley
Well, not super-new (it's in 3.4), but the spatial post-filtering is brand new in 4.0 as of today, and I don't think cache=false and post-filtering was really highlighted well before anyway. http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/ -Yonik lucidimagination.c

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
Sorry for pinging this again, is more information needed on this? I can provide more details but am not sure what to provide. On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson wrote: > Sorry, I shut down the full solr instance. > > On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller wrote: >> Can you ex

Re: correct usage of StreamingUpdateSolrServer?

2012-02-10 Thread Erick Erickson
Well, that's certainly "hello world" . But I'm kinda stumped, I have programs that look an awful lot like this that terminate just fine. Anything in your Solr logs? And are you just executing this once? And what version of Solr are you using? Best Erick On Fri, Feb 10, 2012 at 3:49 PM, T Vinod

Re: indexing with DIH (and with problems)

2012-02-10 Thread alessio crisantemi
with rootEntity="false" it's the same.. help! 2012/2/10 Chantal Ackermann > > > On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote: > > hi all, > > I would index on solr my pdf files wich includeds on my directory > c:\myfile\ > > > > so, I add on my solr/conf directory the file data-con

Re: correct usage of StreamingUpdateSolrServer?

2012-02-10 Thread T Vinod Gupta
here is how i was playing with it.. StreamingUpdateSolrServer solrServer = new StreamingUpdateSolrServer("http://localhost:8983/solr/";, 10, 1); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( "pk_id", "id1"); doc1.addField("doc_type", "content");

Re: Geospatial search with multivalued field

2012-02-10 Thread Marian Steinbach
2012/2/10 Mikhail Khludnev : > Marian, > > Sorry, I completely forgot to mention. > Pls check David's instruction > https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13117350&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13117350 > > The patch you trie

Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
doing some copying I came up with the following boolean fsv = req.getParams().getBool(ResponseBuilder.FIELD_SORT_VALUES,false); if(fsv){ NamedList sortVals = (NamedList) rsp.getValues().get("sort_values"); Sort sort = searcher.weightSort(r

Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
I'd like to look at the pseudo fields you're talking about (don't really understand it right now), but need to get something working in the short term. How do I go about removing these from the sort values? On Fri, Feb 10, 2012 at 3:06 PM, Yonik Seeley wrote: > On Fri, Feb 10, 2012 at 2:48 PM, J

Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Yonik Seeley
On Fri, Feb 10, 2012 at 2:48 PM, Jamie Johnson wrote: > So looking at query component it appears to sort the entire doc list > at the end of process, my component is defined after this query so the > doclist that I get should be sorted, right?  To me this should mean > that I can remove items from

Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
So looking at query component it appears to sort the entire doc list at the end of process, my component is defined after this query so the doclist that I get should be sorted, right? To me this should mean that I can remove items from this list and shift everything left as needed and it should wo

Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-10 Thread geeky2
hello, >> Or does your field in schema.xml have anything like autoGeneratePhraseQueries="true" in it? << there is no reference to this in our production schema. this is extremely confusing. i am not completely clear on the issue? reviewing our previous messages - it looks like the data is bein

Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
It looks like everything works fine without my custom component, which is good for Solr, bad for me. The custom component does some additional authorization processing to remove docs that the user does not have access to. To do this we're iterating through responseBuilder.getResults().docList and

SolrJ and INFO level logging

2012-02-10 Thread Shawn Heisey
In SolrJ, when using CommonsHttpSolrServer, SolrJ doesn't log anything at or below the INFO level. When I have the logging turned on at that level, I only see log messages that I have placed within my own code. If I log at DEBUG, then I do see SolrJ log messages. When I switched to Streaming

Re: Empty results with OR filter query

2012-02-10 Thread Steven Ou
For anyone having this issue in the future: I managed to narrow it down to Solr-RA 3.5. Installing Solr 3.5 solved the issue. I don't really know how the internals of Solr-RA work, but it appears that it was using AND operators even when I explicitly used OR operators in the query. The other solut

Re: How to define field type

2012-02-10 Thread Torlaf15
Hi, that sounds very good. Thank you Toralf -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-define-field-type-tp3732986p3733350.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Geospatial search with multivalued field

2012-02-10 Thread Mikhail Khludnev
Marian, Sorry, I completely forgot to mention. Pls check David's instruction https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13117350&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13117350 The patch you tried to use is just my amendment for the Dav

Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
This is an snapshot of the solrcloud branch from somewhere between a year and 6 months ago (can't really remember off hand) with some custom components, I'm thinking that the custom components may be messing something up. I'm removing them now to test this without those to make sure that the issue

Re: (Old) SolrCloud Date Sorting issue

2012-02-10 Thread Yonik Seeley
On Fri, Feb 10, 2012 at 11:44 AM, Jamie Johnson wrote: > Was there a fix recently to address sorting issues for Dates in solr > cloud?  On my cluster I have a date field which when I sort across the > cluster I get incorrect order executing the following query I get Yikes! There haven't been any

Re: How to define field type

2012-02-10 Thread Erick Erickson
Typically this is handle by defining a second field of type string and use copyField to copy from author to this new field, say, author_facet. Then do your facets on author_facet but do searches on author. Best Erick On Fri, Feb 10, 2012 at 11:19 AM, Torlaf15 wrote: > Hello, > > I hope someon

Re: Range facet - Count in facet menu != Count in search results

2012-02-10 Thread Darren Govoni
Double check your default operator for a faceted search vs. regular search. I caught this difference in my work that explained this difference. On Fri, 2012-02-10 at 07:45 -0800, Yuhao wrote: > Jay, > > Was the curly closing bracket "}" intentional? I'm using 3.4, which also > supports "fq=pric

(Old) SolrCloud Date Sorting issue

2012-02-10 Thread Jamie Johnson
Was there a fix recently to address sorting issues for Dates in solr cloud? On my cluster I have a date field which when I sort across the cluster I get incorrect order executing the following query I get solr/select?distrib=true&q=paul&sort=datetime_dt%20desc&fl=datetime_dt 2009-10-31T1

Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Interesting thing is that the only Tool I found to handle my pdf correctly was pdftotext. 2012/2/10 Robert Muir > On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann > wrote: > > > > Our suggest component and parts of our search is getting hard to use by > > this. Any other ideas? > > > > Looks lik

How to define field type

2012-02-10 Thread Torlaf15
Hello, I hope someone can help me. I have several documents with the fields content, author, ... indexed. Now I would like to make a faceted search. The exact problem is with me following: As a result (SolrResponse) for query I get: facet_fields= {author = {first name=1, surname = 1}}...

Re: Range facet - Count in facet menu != Count in search results

2012-02-10 Thread Erick Erickson
I'll answer for Jan "Yes". Prior to 4.0, you cannot mix inclusive and exclusive operators on a range query. see: https://issues.apache.org/jira/browse/SOLR-355. If you can't go to 4.0, you can cheat and make, say, your top value a tiny bit less than the boundary. For an int-based field [1 To 20] us

RE: Index Start Question

2012-02-10 Thread Hoffman, Chase
Erick, Thanks for the suggestion. I think we're going to go that route. Best, --Chase -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, February 09, 2012 12:30 PM To: solr-user@lucene.apache.org Subject: Re: Index Start Question Hmmm. You say: "

Re: Re: solr search speed is so slow.

2012-02-10 Thread Erick Erickson
Please re-read Hoss' response. There is no need to warm all queries, that will be very slow for autowarming and you quickly reach a point of diminishing returns. Best Erick 2012/2/9 Rong Kang : > Thanks for your reply. > > I didn't use any other params except  q(for example > http://localhost:80

Re: Range facet - Count in facet menu != Count in search results

2012-02-10 Thread Yuhao
Jay, Was the curly closing bracket "}" intentional?  I'm using 3.4, which also supports "fq=price:[10 TO 20]".  The problem is the results are not working properly. From: Jan Høydahl To: solr-user@lucene.apache.org; Yuhao Sent: Thursday, February 9, 2012

Re: correct usage of StreamingUpdateSolrServer?

2012-02-10 Thread Erick Erickson
Can you post the code? SUSS should essentially be a drop-in replacement for CHSS. It's not advisable to commit after every add, it's usually better to use commitWithin, and perhaps commit at the very end of the run. Best Erick On Thu, Feb 9, 2012 at 4:00 PM, T Vinod Gupta wrote: > Hi, > I wrote

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
Sorry, I shut down the full solr instance. On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller wrote: > Can you explain a little more how you doing this? How are you bringing the > cores down and then back up? Shutting down a full solr instance, unloading > the core? > > On Feb 10, 2012, at 9:33 AM, J

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Can you explain a little more how you doing this? How are you bringing the cores down and then back up? Shutting down a full solr instance, unloading the core? On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote: > I know that the latest Solr Cloud doesn't use standard replication but > I have a q

SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
I know that the latest Solr Cloud doesn't use standard replication but I have a question about how it appears to be working. I currently have the following cluster state {"collection1":{ "slice1":{ "JamiesMac.local:8501_solr_slice1_shard1":{ "shard_id":"slice1", "state":

Re: Geospatial search with multivalued field

2012-02-10 Thread Marian Steinbach
2012/2/9 Mikhail Khludnev : > Some time ago I tested backported patch from > https://issues.apache.org/jira/browse/SOLR-2155 > it works. OK, I would do that. But... Against which version can/should I apply the patch? (I am not restricted by other requirements so far.) Then I tried both with the

Re: Tokenize result of a NGramFilterFactory in Solr (query analyzer)

2012-02-10 Thread Mathias Hodler
Hi Ahmet, awesome! Now it works. 2012/2/10 Ahmet Arslan : >> I'm using the NGramFilterFactory for indexing and querying. >> >> So if I'm searching for "overflow" it creates an query like >> this: >> >> mySearchField:"ov ve ... erflow overflo verflow overflow" >> >> But if I misspelled "overflow",

Re: Tokenize result of a NGramFilterFactory in Solr (query analyzer)

2012-02-10 Thread Ahmet Arslan
> I'm using the NGramFilterFactory for indexing and querying. > > So if I'm searching for "overflow" it creates an query like > this: > > mySearchField:"ov ve ... erflow overflo verflow overflow" > > But if I misspelled "overflow", i.e. "owerflow" there are no > matches > because the quotes arou

Tokenize result of a NGramFilterFactory in Solr (query analyzer)

2012-02-10 Thread Mathias Hodler
Hi, I'm using the NGramFilterFactory for indexing and querying. So if I'm searching for "overflow" it creates an query like this: mySearchField:"ov ve ... erflow overflo verflow overflow" But if I misspelled "overflow", i.e. "owerflow" there are no matches because the quotes around the query:

Re: Hi

2012-02-10 Thread Dalius Sidlauskas
Hi, I don't think this is the right place for this question. You should follow samples of solr client api integration in Java and develop your way in konakart.. Regards! Dalius Sidlauskas On 10/02/12 08:25, sumal wrote: My self I am Sumal who working as a Software Engineer. Currently I am de

Hi

2012-02-10 Thread sumal
My self I am Sumal who working as a Software Engineer. Currently I am developing web based e-commerce applications using java and i am using e commerce Konakart shopping cart as well. I am using Konakart community edition. I am kindly requesting some information about how to integrate solr in my

Re: Solr / Tika Integration

2012-02-10 Thread Robert Muir
On Fri, Feb 10, 2012 at 6:18 AM, Dirk Högemann wrote: > > Our suggest component and parts of our search is getting hard to use by > this. Any other ideas? > Looks like https://issues.apache.org/jira/browse/PDFBOX-371 The title of the issue is a bit confusing (I don't think it should go to hyphen

Re: Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Thanks so far. I will have a closer look at the PDF. I tried the enableautospace setting with pdfbox1.6 - did not work: PDFParser parser = new PDFParser(); parser.setEnableAutoSpace(false); ContentHandler handler = new BodyContentHandler(); Output: Va ri an te Creut

Re: Solr / Tika Integration

2012-02-10 Thread Jan Høydahl
I think you need to control the parameter "enableAutoSpace" in PDFBox. There's a JIRA for it, but it depends on some Tika1.1 stuff as far I can understand https://issues.apache.org/jira/browse/SOLR-2930 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - ww

Re: Solr / Tika Integration

2012-02-10 Thread Shairon Toledo
Hi, Maybe the pdf creator tool is not generating a "fluid" text, in pdf has sections defined by objects, e.g. for "Medizin" 20 0 obj (Medizin) endobj However this can happen 20 0 obj (Me) endobj 21 0 obj (di) endobj 22 0 obj (zin) endobj See that, there are 3 text objects, the extraction tool

Re: How do i do group by in solr with multiple shards?

2012-02-10 Thread Kashif Khan
Hi Erick, I have tried grouping with and without shards using solr 3.3. I know solr 3.3 does not support grouping with multiple shards. We have been waiting for 3.5.0 and nw it is available and we will try with that. The reason i am looking for grouping is posted in this link. Please advice me ho

Re: Sorting solrdocumentlist object after querying

2012-02-10 Thread Kashif Khan
hey Tommaso, That result grouping is during the query but i want to sort the solrdocumentlist after it has been queried and i hv injected few solrdocs in the solrdocumentlist. Thus i want this solrdocumentlist to be sorted based on the fields i specify and cannot query the solr for result grouping

Solr / Tika Integration

2012-02-10 Thread Dirk Högemann
Hello, we use Solr 3.5 and Tika to index a lot of PDFs. The content of those PDFs is searchable via a full-text search. Also the terms are used to make search suggestions. Unfortunately pdfbox seems to insert a space character, when there are soft-hyphens in the content of the PDF Thus the extrac

Re: indexing with DIH (and with problems)

2012-02-10 Thread Chantal Ackermann
On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote: > hi all, > I would index on solr my pdf files wich includeds on my directory c:\myfile\ > > so, I add on my solr/conf directory the file data-config.xml like the > following: > > > > > > *0* """ DIH hasn't even retrieved any dat

RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Ahmet Arslan
Hi Zac, Field Analysis tool (analysis.jsp) does not perform actual query parsing. One thing to be aware of when Using Keyword Tokenizer at query time is: Query string (chicken stock) is pre-tokenized according to white spaces, before it reaches keyword tokenizer. If you use quotes ("chicken st

Re: indexing with DIH (and with problems)

2012-02-10 Thread alessio crisantemi
I have problems with full import query. no results. I search in log files and after I write again.. tx a. 2012/2/9 alessio crisantemi > hi all, > I would index on solr my pdf files wich includeds on my directory > c:\myfile\ > > so, I add on my solr/conf directory the file data-config.xml like t

Re: Solr Basic Performance Testwith duplicated data

2012-02-10 Thread Rafał Kuć
Hello! In terms of query performance, Solr will use caches (of course, if they are turned on). So if you will run similar queries (like the same filters, sort and stuff like that) the performance may be different than performance with unique queries. The http://wiki.apache.org/solr/SolrCaching ha

Solr Basic Performance Testwith duplicated data

2012-02-10 Thread Husain, Yavar
Will testing Solr based on duplicated data in the database result in same performance statistics as compared to testing Solr with completely unique data? By test I mean routine performance tests like time to index, time to search etc. Will solr perform any kind of optimization that will result i

SOLR

2012-02-10 Thread mizayah
Is there any way to score not being affected by duplicated input in query? When i have record with field title: "The GIRL with the dragon tattoo" If query is: "girl" it get less score then "girl girl girl". It find word in the same position, why score is growing? I need it to know if record i f

Re: indexing with DIH (and with problems)

2012-02-10 Thread Gora Mohanty
On 10 February 2012 04:15, alessio crisantemi wrote: > hi all, > I would index on solr my pdf files wich includeds on my directory c:\myfile\ > > so, I add on my solr/conf directory the file data-config.xml like the > following: [...] > but this is the result: [...] Your Solr URL for dataimport

RE: Keyword Tokenizer Phrase Issue

2012-02-10 Thread Zac Smith
I have done some further analysis on this and I am now even more confused. When I use the Field Analysis tool with the text 'chicken stock' it highlights that text as a match. The dismax query looks ok to me: +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01) DisjunctionMaxQuery((ingr