Re: Question on Solr/WordPress Integration

2019-03-01 Thread markus kalkbrenner
If you’re more familiar with PHP you can do the same using the Solarium library 
instead of SolrJ for Java.

Once the PDFs are extracted and indexed, Drupal is an alternative to Wordpress 
as Frontend. Using the Serach API Solr module you can access and „present“ any 
existing Solr index without a single line of custom code.

Markus

> Am 02.03.2019 um 01:30 schrieb Erick Erickson :
> 
> Writing a Java (SolrJ) program that traverses a filesystem and extracts the 
> contents of PDF is actually quite simple, see: 
> https://lucidworks.com/2012/02/14/indexing-with-solrj/ (you can ignore the 
> RDBMS stuff). That code is a little out of date so may need some very minor 
> tweaks.
> 
> Tika (the library Solr uses to parse PDFs and most other files) may have 
> something that makes the job even easier, I’d ask on their user’s list. 
> Putting WordPress in the middle of it all seems unnecessarily complicated.
> 
> Best,
> Erick
> 
>> On Mar 1, 2019, at 11:18 AM, Paul Buiocchi  wrote:
>> 
>> Thank you Shawn !
>> 
>> Sent from Yahoo Mail on Android 
>> 
>> On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi 
>> wrote:   Greetings, 
>> 
>> I have a couple of questions about Solr /Wordpress integration - 
>> 
>> First , I am not "committed to using WordPress as a front end. If there is a 
>> better front end option , I would be willing to convert. For functionality , 
>> all I am looking for is the ability to full txt search , highlight the 
>> search terms in the search results  It should be pretty simple , maybe I 
>> am overanalyzing it  ...Looking for as much "out of the box" as possible 
>> 
>> My scenario is this: 
>> 
>> I am putting together an old newspaper archive site . about 25k pdf files 
>> that are full txt searchable. 
>> 
>> Questions on architecture: 
>> 1) Is there a way for Solr to index from a local file structure i.e local 
>> drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
>> Wordpress/Solr integration , I found that I had to upload the documents in 
>> Wordpress to get Solr to recognize them . 
>> 
>> I'm sure I will have more questions , any help/suggestions would be greatly 
>> appreciated - thank you  
>> 
>> Sent from Yahoo Mail on Android  
> 


Re: Question on Solr/WordPress Integration

2019-03-01 Thread Erick Erickson
Writing a Java (SolrJ) program that traverses a filesystem and extracts the 
contents of PDF is actually quite simple, see: 
https://lucidworks.com/2012/02/14/indexing-with-solrj/ (you can ignore the 
RDBMS stuff). That code is a little out of date so may need some very minor 
tweaks.

Tika (the library Solr uses to parse PDFs and most other files) may have 
something that makes the job even easier, I’d ask on their user’s list. Putting 
WordPress in the middle of it all seems unnecessarily complicated.

Best,
Erick

> On Mar 1, 2019, at 11:18 AM, Paul Buiocchi  wrote:
> 
> Thank you Shawn !
> 
> Sent from Yahoo Mail on Android 
> 
>  On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi 
> wrote:   Greetings, 
> 
> I have a couple of questions about Solr /Wordpress integration - 
> 
> First , I am not "committed to using WordPress as a front end. If there is a 
> better front end option , I would be willing to convert. For functionality , 
> all I am looking for is the ability to full txt search , highlight the search 
> terms in the search results  It should be pretty simple , maybe I am 
> overanalyzing it  ...Looking for as much "out of the box" as possible 
> 
> My scenario is this: 
> 
> I am putting together an old newspaper archive site . about 25k pdf files 
> that are full txt searchable. 
> 
> Questions on architecture: 
> 1) Is there a way for Solr to index from a local file structure i.e local 
> drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
> Wordpress/Solr integration , I found that I had to upload the documents in 
> Wordpress to get Solr to recognize them . 
> 
> I'm sure I will have more questions , any help/suggestions would be greatly 
> appreciated - thank you  
> 
> Sent from Yahoo Mail on Android  



Re: Question on Solr/WordPress Integration

2019-03-01 Thread Paul Buiocchi
Thank you Shawn !

Sent from Yahoo Mail on Android 
 
  On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi 
wrote:   Greetings, 

I have a couple of questions about Solr /Wordpress integration - 

First , I am not "committed to using WordPress as a front end. If there is a 
better front end option , I would be willing to convert. For functionality , 
all I am looking for is the ability to full txt search , highlight the search 
terms in the search results  It should be pretty simple , maybe I am 
overanalyzing it  ...Looking for as much "out of the box" as possible 

My scenario is this: 

I am putting together an old newspaper archive site . about 25k pdf files that 
are full txt searchable. 

Questions on architecture: 
1) Is there a way for Solr to index from a local file structure i.e local 
drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
Wordpress/Solr integration , I found that I had to upload the documents in 
Wordpress to get Solr to recognize them . 

I'm sure I will have more questions , any help/suggestions would be greatly 
appreciated - thank you  

Sent from Yahoo Mail on Android  


Re: Question on Solr/WordPress Integration

2019-03-01 Thread Shawn Heisey

On 3/1/2019 10:25 AM, Paul Buiocchi wrote:

I have a couple of questions about Solr /Wordpress integration -


You would need to talk to the person who wrote the plugin for Wordpress 
that integrates with Solr.  If they indicate that a question can only be 
answered by the Solr project, then bring that to us.



I am putting together an old newspaper archive site . about 25k pdf files that 
are full txt searchable.


If you want Solr to index your PDF documents, you would have to use 
SolrCell, also known as the Extracting Request Handler.


We strongly recommend that this functionality should never be used in 
production.  The reason is that the underlying technology, Apache Tika, 
can crash when given certain input.  PDF documents are more likely than 
other kinds to cause this problem.  If Tika crashes when it is being run 
inside Solr, then Solr will also crash.



Questions on architecture:
1) Is there a way for Solr to index from a local file structure i.e local 
drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
Wordpress/Solr integration , I found that I had to upload the documents in 
Wordpress to get Solr to recognize them .


Yes, you can index just about anything you like if you are willing to 
create the configuration and the software to do it.  But in order for 
Wordpress to understand that data, it most likely would have to be done 
through Wordpress.


Thanks,
Shawn


Question on Solr/WordPress Integration

2019-03-01 Thread Paul Buiocchi
Greetings, 

I have a couple of questions about Solr /Wordpress integration - 

First , I am not "committed to using WordPress as a front end. If there is a 
better front end option , I would be willing to convert. For functionality , 
all I am looking for is the ability to full txt search , highlight the search 
terms in the search results  It should be pretty simple , maybe I am 
overanalyzing it  ...Looking for as much "out of the box" as possible 

My scenario is this: 

I am putting together an old newspaper archive site . about 25k pdf files that 
are full txt searchable. 

Questions on architecture: 
1) Is there a way for Solr to index from a local file structure i.e local 
drive:/newpaper_name/date/page# ? . From the experimenting I have done with 
Wordpress/Solr integration , I found that I had to upload the documents in 
Wordpress to get Solr to recognize them . 

I'm sure I will have more questions , any help/suggestions would be greatly 
appreciated - thank you  

Sent from Yahoo Mail on Android