Re: How to index documents in SOLR running in Window XP envronment

2012-01-05 Thread Dan McGinn-Combs
 Look in the Example directory for a POST.SH and POST.JAR. These could be
used to do the job on Windows. But to be honest, I didn't have any problems
using CURL on Windows. You just have to be careful to double quote rather
than single quote and use the right kind of slashes for directories.

Dan

On Thursday, January 5, 2012, dsy99 ds...@rediffmail.com wrote:
 Dear Gora and all,
 Thank you very much for replying.
 My question is how to index documents (.XML, .pdf, .doc files) in Solr. I
 was trying using curl but it is not working in Windows XP environment. Do
 any one of you have any ready made program/DIH which I can use to index
 these types of files.

 Regds:
 Divakar

 --
 View this message in context:
http://lucene.472066.n3.nabble.com/How-to-index-documents-in-SOLR-running-in-Window-XP-envronment-tp3632488p3634507.html
 Sent from the Solr - User mailing list archive at Nabble.com.


-- 
Dan McGinn-Combs
dgco...@gmail.com
Google Voice: +1 404 492 7532
Peachtree City, Georgia USA


How to index documents in SOLR running in Window XP envronment

2012-01-05 Thread Dan McGinn-Combs
Look in the Example directory for a POST.SH and POST.JAR. These could be
used to do the job on Windows. But to be honest, I didn't have any problems
using CURL on Windows. You just have to be careful to double quote rather r

On Thursday, January 5, 2012, dsy99 ds...@rediffmail.com wrote:
 Dear Gora and all,
 Thank you very much for replying.
 My question is how to index documents (.XML, .pdf, .doc files) in Solr. I
 was trying using curl but it is not working in Windows XP environment. Do
 any one of you have any ready made program/DIH which I can use to index
 these types of files.

 Regds:
 Divakar

 --
 View this message in context:
http://lucene.472066.n3.nabble.com/How-to-index-documents-in-SOLR-running-in-Window-XP-envronment-tp3632488p3634507.html
 Sent from the Solr - User mailing list archive at Nabble.com.


-- 
Dan McGinn-Combs
dgco...@gmail.com
Google Voice: +1 404 492 7532
Peachtree City, Georgia USA


Re: Retrieving Documents

2011-12-19 Thread Dan McGinn-Combs
I can see why you are confused. Re-reading it, I'm confused.
Here's my dilemna.

I am trying index some one hundred or so books all in EPUB format. The
goal is to provide research functions, i.e. people who need to
reference specific quotes, pages and books for their writing.

I don't know if EPUB is designed to do this by default, but each book
is created/converted to EPUB using Calibre. Each page is packed into
the EPUB file as a separate HTML file with the format
title_split_page number.html. So the upshot of my question is
whether there is a way to extract the page number from the title of
the embedded HTML page and expose that in a Solr field that I can
subsequently display to the user?

I hope that makes a bit more sense.

Still looking through the Wiki because it seems to be stuffed with goodies.

Dan

On Sat, Dec 17, 2011 at 2:59 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi Dan,

 I don't follow the second paragraph.  Not sure what you are trying to do, 
 what you've tried, what didn't work and how...

 Otis
 

 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html




 From: Dan McGinn-Combs dgco...@gmail.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Saturday, December 17, 2011 9:30 AM
Subject: Re: Retrieving Documents

Good pointer. Thank you, that is exactly what I had in mind. To the
second point, yes, sort of.

I've managed to take apart a sample of the ePub documents (there are a
finite number). Inside the ePub are single HTML documents that are a
single page of the overall book. It would be super to be able to parse
the title (originally formed from the page number) to set up a
dynamically generated documented and include that as part of the
results. Combing the wiki now since that's where every answers seems
to be! Pointers welcome though.
Thanks!
--
Dan McGinn-Combs

On Dec 16, 2011, at 11:52 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hi Dan,

 1) Are you looking for 
 http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?

 2) Hundreds of words in a field should not be a problem for highlighting.  
 But it sounds like this long field may contain content that corresponds to 
 N different pages in a publication and you would like to inform the 
 searcher which page the match was on, and not just that a match was 
 somewhere in that big piece of text.  One way to deal with that is to break 
 your document into N smaller documents - one document for each page.

 Otis
 

 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html



 
 From: Dan McGinn-Combs dgco...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, December 16, 2011 4:33 PM
 Subject: Retrieving Documents

 I've been doing a fair amount of reading and experimenting with Solr
 lately. I find that it does a good job of indexing very structured
 documents. However, the application I have in mind is build around
 long EPUB documents.

 Of course, I found the Extract components useful for indexing the
 EPUBs. However, I would like to be able to

 * Size the highlight portion of text around the query parameters
 (i.e. show 20 or 30 words) and

 * Retrieve a location within the document so I can display that page
 from the EPUB.

 What is common practice for these? I notice that if I have a list of
 (short) text segments in fields, they are stored without too much fuss
 and are retrievable. However, I'm talking about a field of potentially
 hundreds of words.

 Thanks for any pointers,
 Dan

 --
 Dan McGinn-Combs
 dgco...@gmail.com
 Peachtree City, Georgia USA








-- 
Dan McGinn-Combs
dgco...@gmail.com
Google Voice: +1 404 492 7532
Peachtree City, Georgia USA


Re: Retrieving Documents

2011-12-17 Thread Dan McGinn-Combs
Good pointer. Thank you, that is exactly what I had in mind. To the
second point, yes, sort of.

I've managed to take apart a sample of the ePub documents (there are a
finite number). Inside the ePub are single HTML documents that are a
single page of the overall book. It would be super to be able to parse
the title (originally formed from the page number) to set up a
dynamically generated documented and include that as part of the
results. Combing the wiki now since that's where every answers seems
to be! Pointers welcome though.
Thanks!
--
Dan McGinn-Combs

On Dec 16, 2011, at 11:52 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hi Dan,

 1) Are you looking for 
 http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?

 2) Hundreds of words in a field should not be a problem for highlighting.  
 But it sounds like this long field may contain content that corresponds to N 
 different pages in a publication and you would like to inform the searcher 
 which page the match was on, and not just that a match was somewhere in that 
 big piece of text.  One way to deal with that is to break your document into 
 N smaller documents - one document for each page.

 Otis
 

 Performance Monitoring SaaS for Solr - 
 http://sematext.com/spm/solr-performance-monitoring/index.html



 
 From: Dan McGinn-Combs dgco...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, December 16, 2011 4:33 PM
 Subject: Retrieving Documents

 I've been doing a fair amount of reading and experimenting with Solr
 lately. I find that it does a good job of indexing very structured
 documents. However, the application I have in mind is build around
 long EPUB documents.

 Of course, I found the Extract components useful for indexing the
 EPUBs. However, I would like to be able to

 * Size the highlight portion of text around the query parameters
 (i.e. show 20 or 30 words) and

 * Retrieve a location within the document so I can display that page
 from the EPUB.

 What is common practice for these? I notice that if I have a list of
 (short) text segments in fields, they are stored without too much fuss
 and are retrievable. However, I'm talking about a field of potentially
 hundreds of words.

 Thanks for any pointers,
 Dan

 --
 Dan McGinn-Combs
 dgco...@gmail.com
 Peachtree City, Georgia USA




Retrieving Documents

2011-12-16 Thread Dan McGinn-Combs
I've been doing a fair amount of reading and experimenting with Solr
lately. I find that it does a good job of indexing very structured
documents. However, the application I have in mind is build around
long EPUB documents.

Of course, I found the Extract components useful for indexing the
EPUBs. However, I would like to be able to

* Size the highlight portion of text around the query parameters
(i.e. show 20 or 30 words) and

* Retrieve a location within the document so I can display that page
from the EPUB.

What is common practice for these? I notice that if I have a list of
(short) text segments in fields, they are stored without too much fuss
and are retrievable. However, I'm talking about a field of potentially
hundreds of words.

Thanks for any pointers,
Dan

-- 
Dan McGinn-Combs
dgco...@gmail.com
Peachtree City, Georgia USA