Help creating schema for indexable document

2009-08-07 Thread rossputin

Hi Guys.

I am struggling to create a schema with a determinist content model for a
set of documents I want to index.

My indexable documents will look something like:

add
  doc
field name=id1/field
field name=codecode1/field
field name=codecode2/field
field name=categorymycategory/field
  /doc
/add

My service will be mission critical and will accept batch imports from a
potentially unreliable source.  Are there any xml schema guru's who can help
me with creating xn xsd which will work with my sample document?

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/Help-creating-schema-for-indexable-document-tp24862700p24862700.html
Sent from the Solr - User mailing list archive at Nabble.com.



post error - ERROR:unknown field 'title'

2009-07-20 Thread rossputin

Hi guys.

I have two different solr versions as I am evaluating nightly builds.  On a
more recent one.. I think 15th July I am getting the following error :

ERROR:unknown field 'title'

I am posting to 'solr/update/extract' with the following:

curl
http://localhost:8983/solr/update/extract?ext.literal.id=1ext.literal.code=somecodeext.literal.url=someurl/file.pdfext.literal.category=somecatext.literal.updated=2009-06-01T09:10:30.000Zext.idx.attr=true\ext.def.fl=text;
-F myfi...@1411_9.pdf

My schema does not, and is not intended to contain a 'title' field.

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/post-error---ERROR%3Aunknown-field-%27title%27-tp24567235p24567235.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: posting binary file and metadata in two separate documents

2009-07-17 Thread rossputin

Hi.

Thanks for your reply, shame nobody has already implemented the multiple
'ContentStreams' idea :-)
With regards to posting in a form, I had considered that, but unfortunately
there can be an arbitrary number of 'ext.literals', so it would be difficult
to build a form which would handle all cases.

Regards,

 -- Ross


hossman wrote:
 
 
 : Subject: posting binary file and metadata in two separate documents
 
 there was some discussion a while back about that fact that you can push 
 multiple ContentStreams to SOlr in a single request, and while the 
 existing handelrs all just iterate over and process them seperately, it 
 would be *possible* for a variant of ExtractingRequest handler to use the 
 first stream to get document metadat, and have that metdata refrence the 
 other streams in some way for large chunks of text)
 
 But no one has attempted to implement that as far as i know.
 
 :
 http://localhost:8983/solr/update/extract?ext.literal.id=2ext.literal.some_code1=code1ext.literal.some_code2=code2ext.idx.attr=true\ext.def.fl=text;
 : -F myfi...@myfile.pdf
 : 
 : Where I have large numbers of ext.literal params this becomes a bit of a
 : chore.. and it would be the same case in an html form with many
 params... 
 : can I pass both files to '/update/extract' as documents, (files) linked
 : together?  Or are there any other options like this?  Perhaps something
 I
 : can do with Solrj.
 
 there's no reason those params have ot be in the URL.  you can do a 
 multipart POST with application/x-www-form-urlencoded in one part and your 
 pdf file in another part (just like doing a POST from a massive HTML form 
 with an 'input type=file' option)
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24530051.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: posting binary file and metadata in two separate documents

2009-07-10 Thread rossputin

Hi.

Apologies for bumping this one, but another question occurred to me... is
there a limit to the number of ext.literal components I can put in my curl
command... if so, i will definitely need to find another way to get this
data in, as I am building up relationships between documents, and there will
be many of them.

Thanks in advance for your help,

regards,

Ross



rossputin wrote:
 
 Hi.
 
 I am currently using Solr Cell to extract content from binary files, and I
 am passing along some additional metadata with ext.literal params. Sample
 below:
 
 curl
 http://localhost:8983/solr/update/extract?ext.literal.id=2ext.literal.some_code1=code1ext.literal.some_code2=code2ext.idx.attr=true\ext.def.fl=text;
 -F myfi...@myfile.pdf
 
 Where I have large numbers of ext.literal params this becomes a bit of a
 chore.. and it would be the same case in an html form with many params... 
 can I pass both files to '/update/extract' as documents, (files) linked
 together?  Or are there any other options like this?  Perhaps something I
 can do with Solrj.
 
 Thanks in advance for your help,
 
 regards,
 
 Ross.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24423267.html
Sent from the Solr - User mailing list archive at Nabble.com.



posting binary file and metadata in two separate documents

2009-07-07 Thread rossputin

Hi.

I am currently using Solr Cell to extract content from binary files, and I
am passing along some additional metadata with ext.literal params. Sample
below:

curl
http://localhost:8983/solr/update/extract?ext.literal.id=2ext.literal.some_code1=code1ext.literal.some_code2=code2ext.idx.attr=true\ext.def.fl=text;
-F myfi...@myfile.pdf

Where I have large numbers of ext.literal params this becomes a bit of a
chore.. and it would be the same case in an html form with many params... 
can I pass both files to '/update/extract' as documents, (files) linked
together?  Or are there any other options like this?  Perhaps something I
can do with Solrj.

Thanks in advance for your help,

regards,

Ross.


-- 
View this message in context: 
http://www.nabble.com/posting-binary-file-and-metadata-in-two-separate-documents-tp24375649p24375649.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr document structure for preserving version information

2009-06-05 Thread rossputin

Hi Guys.

This is a schema design question I suppose.  I would like to store a series
of version elements comprising of two attributes, 'updated' (a date) and
'reason' (just a simple string).  I aim to produce xml based on a search
which would look something like:

document
  name
  version updated=01/04/2009 10:30:00 reason=changes made/
  version updated=02/04/2009 11:10:00 reason=more changes made/
/document

So I realise I could use multiValued fields, but I want to avoid doing
something like:

version01/04/2009 10:30:00|changes made/version (using | or some other
separator)

As I would need to split the field in my code.  This approach does not seem
the best.  Has anyone got an approach they could share ?

Thanks in advance for your help,

 - Ross
-- 
View this message in context: 
http://www.nabble.com/Solr-document-structure-for-preserving-version-information-tp23885262p23885262.html
Sent from the Solr - User mailing list archive at Nabble.com.



highlight results from pdf search

2009-05-30 Thread rossputin

Hi.

I have some PDF documents indexed through solr cell.  My highlighting
queries work fine on standard xml doc types, eg the samples.  I would now
like to highlight some queries on a PDF document.  Currently for my simple
examples I am just indexing a PDF, providing an id, and an arbitrary
ext.literal.  I would like to be able to get highlighted snippets back from
the extracted content of the PDF.  Is this possible?

Thanks in advance for your help,

 - Ross
-- 
View this message in context: 
http://www.nabble.com/highlight-results-from-pdf-search-tp23791905p23791905.html
Sent from the Solr - User mailing list archive at Nabble.com.



Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My curl
post has a URL like:

../solr/update/extract?ext.idx.attr=trueext.def.fl=textext.literal.id=123ext.literal.author=Somebody

Sure enough I see in the server logs:

params={ext.def.fl=textext.literal.id=123ext.idx.attr=trueext.literal.author=Somebody}

I am trying to get my field back in the results from a query:

../solr/select?indent=onversion=2.2q=hellostart=0rows=10fl=author%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this field?

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Additional metadata when using Solr Cell

2009-05-14 Thread rossputin

There is no reference to the author field I am trying to set.. I am using the
latest nightly download.

 -- Ross


Grant Ingersoll-6 wrote:
 
 what does /admin/luke show for fields and terms in the fields?
 
 On May 14, 2009, at 10:03 AM, rossputin wrote:
 

 Hi.

 I am indexing a PDF document with the ExtractingRequestHandler.  My  
 curl
 post has a URL like:

 ../solr/update/extract? 
 ext 
 .idx 
 .attr 
 =trueext.def.fl=textext.literal.id=123ext.literal.author=Somebody

 Sure enough I see in the server logs:

 params 
 = 
 {ext 
 .def 
 .fl 
 = 
 textext.literal.id=123ext.idx.attr=trueext.literal.author=Somebody}

 I am trying to get my field back in the results from a query:

 ../solr/select? 
 indent=onversion=2.2q=hellostart=0rows=10fl=author 
 %2Cscoreqt=standardwt=standardexplainOther=hl.fl=

 I see the score in the results 'doc' but no reference to author.

 Can anyone advise on what I am forgetting to do, to get hold of this  
 field?

 Thanks in advance for your help,

 -- Ross
 -- 
 View this message in context:
 http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
Sent from the Solr - User mailing list archive at Nabble.com.