Re: Question about indexing PDFs

2016-08-26 Thread Betsey Benagh
Erick, I’m not sure of anything. I’m new to Solr and find the documentation extremely confusing. I’ve searched the web and found tutorials/advice, but they generally refer to older versions of Solr, and refer to methods/settings/whatever that no longer exist. That’s why I’m asking for help

RE: Question about indexing PDFs

2016-08-26 Thread Srinivasa Meenavalli
=json=true Regards Srinivas Meenavalli -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, August 26, 2016 3:09 AM To: solr-user Subject: Re: Question about indexing PDFs That is always a dangerous assumption. Ar

Re: Question about indexing PDFs

2016-08-25 Thread Erick Erickson
That is always a dangerous assumption. Are you sure you're searching on the proper field? Are you sure it's indexed? Are you sure it's The schema browser I indicated above will give you some idea what's actually in the field. You can not only see the fields Solr (actually Lucene) see in your

Re: Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
Right, that¹s where I looked. No Œcontent¹. Which is what confused me. On 8/25/16, 1:56 PM, "Erick Erickson" wrote: >when you say "I don't see it in the schema for that collection" are you >talking schema.xml? managed_schema? Or actual documents in the index? >Often

Re: Question about indexing PDFs

2016-08-25 Thread Betsey Benagh
It looks like the metadata of the PDFs was indexed, but not the content (which is what I was interested in). Searches on terms I know exist in the content come up empty. On 8/25/16, 2:16 PM, "Betsey Benagh" wrote: >Right, that¹s where I looked. No Œcontent¹.

Re: Question about indexing PDFs

2016-08-25 Thread Erick Erickson
when you say "I don't see it in the schema for that collection" are you talking schema.xml? managed_schema? Or actual documents in the index? Often these are defined by dynamic fields and the like in the schema files. Take a look at the admin UI>>schema browser>>drop down and you'll see all the

Re: Question about Indexing Updated Documents

2016-07-01 Thread Chris Hostetter
If you are already using DIH, then you can use a deltaQuery to find "updated" documents and index only them. https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler Some people just parameterize their main DIH query and use request

Re: question about indexing...

2010-05-26 Thread Jörg Agatz
Ok, Done... But no changes! I have the following in the Schema.xml Made: field name=all type=string indexed=true stored=true multiValued=true/ field name=P_CONTENT_ITEMS_COMMENT type=text indexed=true stored=true multiValued=true/ field name=comment type=string indexed=true stored=true

Re: question about indexing...

2010-05-26 Thread Jörg Agatz
Sorry, i mean: The XML like This: field name=P_CONTENT_ITEMS_COMMENT![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field

Re: question about indexing...

2010-05-26 Thread Jörg Agatz
OK, Done.. i reboot the Server. Now it works.. is the Textfield Single instance? how can i make it? In textfield indext the Word : Hallo if i search Hallo i found hallo i found Hall* i dont hall* i found But some user will search Hall* One more little Question i have... The Difference from

Re: question about indexing...

2010-05-26 Thread Erik Hatcher
On May 26, 2010, at 3:49 AM, Jörg Agatz wrote: is the Textfield Single instance? how can i make it? I'm not sure what you're asking. You can have as many text fields as you like, or as many of any other type as well. In textfield indext the Word : Hallo if i search Hallo i found hallo

Re: question about indexing...

2010-05-25 Thread Erik Hatcher
Well, you'll just have to create valid XML, either encoding some characters or using CDATA sections. Erik On May 25, 2010, at 10:06 AM, Jörg Agatz wrote: I have a work!, i musst indexing a lot of E-Mails, so i will create a Script to generate me a xml of the Mails. Now is the

Re: question about indexing...

2010-05-25 Thread Jörg Agatz
ok, done.. But now i dosent find any word in the CDATA field. i make : field name=P_CONTENT_ITEMS_COMMENT![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field it is a string field Multivalued..

Re: question about indexing...

2010-05-25 Thread Erik Hatcher
You have to provide more details than that. We need to know the field definition for that named field, the corresponding field type definition, and the exact request you're making to Solr that you think should find this document. And most importantly, did you commit/ :) Erik On

Re: question about indexing...

2010-05-25 Thread Jörg Agatz
i create a new Index, but nothing Change. field name=COMMENT type=string indexed=true stored=true multiValued=true/ field name=COMMENT ![CDATA[ Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere Ruhe haben. bich du er sie es/b Ha ha Ha ha ha ha ha ha ha ha ]]/field I

Re: question about indexing...

2010-05-25 Thread Lance Norskog
Change type=string to type=text. This causes the field to be analyzed and then searching on words finds the document. On Tue, May 25, 2010 at 8:34 AM, Jörg Agatz joerg.ag...@googlemail.com wrote: i create a new Index, but nothing Change.  field name=COMMENT type=string indexed=true

Re: question about indexing...

2010-05-25 Thread Erick Erickson
Don't forget to re-index after you make the change Lance suggested... Erick On Tue, May 25, 2010 at 4:51 PM, Lance Norskog goks...@gmail.com wrote: Change type=string to type=text. This causes the field to be analyzed and then searching on words finds the document. On Tue, May 25, 2010 at