Re: How many fields can SOLR handle?

2011-07-07 Thread William Bell
RoySolr,

Not sure what language your client is written in, but this is a simple
if statement.

if (category == "TV") {
   qStr = "q=*:*&facet=true&facet.field=tv_size&facet.field=resolution";
elseif (category == "Computer") {
   qStr = "q=*:*&facet=true&facet.field=cpu&facet.field=gpu";
}

curl "http://localhost:8983/solr/select?"; + qStr;


On Thu, Jul 7, 2011 at 2:29 AM, roySolr  wrote:
> Hello Erik,
>
> I need the *_facets also for searching so stored must be true.
>
> "Then, and I used *_facet similar to you, kept a list of all *_facet actual
> field names and used those in all subsequent search requests. "
>
> Is this not bad for performance? I only need a few facets, not all.(only the
> facets for the chosen category)
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp3033910p3147520.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: solr replication

2011-07-07 Thread William Bell
You can query the replication status on the slave... When it is
complete continue...

On Thu, Jul 7, 2011 at 3:40 PM, Nolan Frausto  wrote:
> We are looking for a call back to know when replication has finished after
> we force a replication using
> http://slave_host:port/solr/replication?command=fetchindex. What is the best
> way to go about doing this?  We are thinking of forcing the replication then
> pulling the command=details page of the slaves to compare its version to
> master.
>
> Also any issues that might be involved with this, for instance if there is a
> replication going on when we try to force one what happens?
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Virtual Memory usage increases beyond Xmx with Solr 3.3

2011-07-07 Thread Nikhil Chhaochharia


Hi,

I am using Ubuntu 10.04 64-bit with Sun Java (build 1.6.0_24-b07) and Tomcat 
(6.0.24).  Sun Java and Tomcat have been installed using apt-get from the 
Ubuntu/Canonical repositories.  I run Tomcat with -Xmx4g and have been using 
using Solr 1.4/3.0/3.1/3.2 without any problems.

However, if I upgrade to Solr 3.3, then the Virtual Memory of the Tomcat 
process increases to roughly the index size (70GB).  Any ideas why
 this is happening?

Thanks,
Nikhil

Re: boosting and relevancy options from solr extensibility points -java-

2011-07-07 Thread Cengiz Han
in a certain time period (say christmas) I will promote a doc in "christmas"
keyword.
or based on users interest I will boost a specific category of products.
or (I am not sure how can I do this one) I will boost docs that current
user's friends (source:facebook) purchased/used/...
or based on region I will promote a product in search result.
or boost discounted products.
or boost docs has all keyword as exact 'title"
...
actually most importantly I want to boost some docs based on current users
preferences/history/social network data/...

thank you for your help.

On Thu, Jul 7, 2011 at 5:33 PM, Erick Erickson wrote:

> Have you looked at dismax/edismax?
>
> I'm not clear what "rules" would be. Could
> you provide some examples? Should
> various fields get different boosts? Different
> boosts based on part-of-speech? Boosts
> based on what the value being searched is?
>
> Best
> Erick
>
> On Thu, Jul 7, 2011 at 6:38 PM, Cengiz Han  wrote:
> > Hi all,
> > I am very new to SOLR, currently trying to spike it out.
> >
> > I found some resources about boosting from query string parameters but I
> > want to configure all this boosting "rules" for my application in the
> search
> > server (solr) level, I don't want to build and manipulate SOLR queries in
> my
> > application level. SOLR should be keeping or relevancy related boosting
> > options in server side. Firstful can I do that? If so which one is the
> right
> > way, queryparser? request handler? component?
> >
> > It would be great, If you can share a sample or an introduction
> > article/resource.
> >
> > Thanks in advance
> >
> > --
> > cengiz han
> > +1(403)923-5455
> > blog:develoq 
> >
>



-- 
cengiz han
+1(403)923-5455
blog:develoq 


Re: Any way to get the value if sorting by function?

2011-07-07 Thread arian487
Fixed it, turns out I cant get the score if I sort by a function but if I run
a function query it'll sort by score and give me the score.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-way-to-get-the-value-if-sorting-by-function-tp3148864p3150216.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can't get moreLikeThis to work

2011-07-07 Thread Koji Sekiguchi

Plus, debugQuery=on would help you when using MLT after 3.1:

https://issues.apache.org/jira/browse/SOLR-860

koji
--
http://www.rondhuit.com/en/

(11/07/08 6:55), Juan Grande wrote:

Hi Elaine,

The first thing that comes to my mind is that neither the content nor the
term vectors of "text" and "category_text" fields are being stored. Check
the name of the parameter used to store the term vectors, which actually is
"termVectors" and not "term_vectored" (see
http://wiki.apache.org/solr/SchemaXml#Expert_field_options).

Try changing that and tell us if it worked!

Regards,

*Juan*



On Thu, Jul 7, 2011 at 4:44 PM, Elaine Li  wrote:


Hi Folks,

This is my configuration for mlt in solrconfig.xml


  name,text,category_text
  2
  1
  3
  1000
  50
  5000
  true
  name,text,category_text
  



I also defined the three fields to have term_vectored attribute in
schema.xml




When i submit the query
"http://localhost:8983/solr/mlt?q=id:69134&mlt.count=10";, the return
only contains one document with id=69134.

Does anyone know or can guess what I missed? Thanks.

Elaine








Re: (Solr-UIMA) Doubt regarding integrating UIMA in to solr - Configuration.

2011-07-07 Thread Koji Sekiguchi

(11/07/07 18:38), Sowmya V.B. wrote:

Hi

I am trying to add UIMA module in to Solr..and began with the readme file
given here.
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/README.txt


I would recommend you to use Solr 3.3 rather than 3.1, as we have changed some 
configuration
in solrconfig.xml for UIMA.


2. modify your schema.xml adding the fields you want to be hold
metadata specifying proper values for type, indexed, stored and
multiValued options:

-I understood this line as: adding to my schema.xml, the new fields that
will come as a result of a UIMA pipeline. For example, in my UIMA pipeline,
post-processing, I get fields A,B,C in addition to fields X,Y,Z that I
already added to the SolrInputDocument. So, does this mean I should add
A,B,C to the schema.xml?


I think you got it. Have you tried it but you got some errors?


3. In SolrConfig.xml,

inside,


 


The uimaConfig tag has been moved into update processor setting @ Solr 3.2.
Please see the latest README.txt.


if iam not using any of those "alchemy api key..." etc, I think I can remove
those lines. However, I plan to use the openNLP tagger&  tokenizer, and an
annotator I wrote for my task. Can I give my model file locations here as
runtimeParameters?


I don't have an idea of openNLP.


4. I did not understand what "fieldMapping" tag does. The description said:
"field mapping describes which features of which types should go in a
field"--
- For example, in this snippet from the link:

  

   

-what does "feature" mean and what does "field" mean?


This defines a map uima feature:

http://uima.apache.org/d/uimaj-2.3.1/references.html#ugr.ref.xml.component_descriptor.type_system.features

to Solr field.

koji
--
http://www.rondhuit.com/en/


Re: boosting and relevancy options from solr extensibility points -java-

2011-07-07 Thread Erick Erickson
Have you looked at dismax/edismax?

I'm not clear what "rules" would be. Could
you provide some examples? Should
various fields get different boosts? Different
boosts based on part-of-speech? Boosts
based on what the value being searched is?

Best
Erick

On Thu, Jul 7, 2011 at 6:38 PM, Cengiz Han  wrote:
> Hi all,
> I am very new to SOLR, currently trying to spike it out.
>
> I found some resources about boosting from query string parameters but I
> want to configure all this boosting "rules" for my application in the search
> server (solr) level, I don't want to build and manipulate SOLR queries in my
> application level. SOLR should be keeping or relevancy related boosting
> options in server side. Firstful can I do that? If so which one is the right
> way, queryparser? request handler? component?
>
> It would be great, If you can share a sample or an introduction
> article/resource.
>
> Thanks in advance
>
> --
> cengiz han
> +1(403)923-5455
> blog:develoq 
>


boosting and relevancy options from solr extensibility points -java-

2011-07-07 Thread Cengiz Han
Hi all,
I am very new to SOLR, currently trying to spike it out.

I found some resources about boosting from query string parameters but I
want to configure all this boosting "rules" for my application in the search
server (solr) level, I don't want to build and manipulate SOLR queries in my
application level. SOLR should be keeping or relevancy related boosting
options in server side. Firstful can I do that? If so which one is the right
way, queryparser? request handler? component?

It would be great, If you can share a sample or an introduction
article/resource.

Thanks in advance

-- 
cengiz han
+1(403)923-5455
blog:develoq 


Re: Query does not work when changing param order

2011-07-07 Thread Juan Grande
Hi Juan!

I think your problem is that in the second case the FieldQParserPlugin is
building a phrase query for "mytag myothertag". I recommend you to split the
filter in two different filters, one for each tag. If each tag is used in
many different filters, and the combination of tags is rarely repeated, this
will also result in a more efficient use of filterCache.

Regards,

*Juan*



On Thu, Jul 7, 2011 at 12:07 PM, Juan Manuel Alvarez wrote:

> Hi everyone!
>
> I would like to ask you a question about a problem I am facing with a
> Solr query.
>
> I have a field "tags" of type "textgen" and some documents with the
> values "myothertag,mytag".
>
> When I use the query:
> /solr/select?sort=name_sort+asc&start=0&qf=tags&q.alt=*:*&fq={!field
> q.op=AND f=tags}myothertag mytag&rows=60&defType=dismax
>
> everything works as expected, but if I change the order of the
> parameters in the fq, like this
> /solr/select?sort=name_sort+asc&start=0&qf=tags&q.alt=*:*&fq={!field
> q.op=AND f=tags}mytag myothertag&rows=60&defType=dismax
> I get no results.
>
> As far as I have seen, the "textgen" fieldshould tokenize the words in
> the field, so if I use comma-separated values, like in my example,
> both words are going to be indexed.
>
> Can anyone please point me in the right direction?
>
> Cheers!
> Juan M.
>


Re: (Solr-UIMA) Doubt regarding integrating UIMA in to solr - Configuration.

2011-07-07 Thread Sowmya V.B.
Can someone help me with this please?

I am not able to understand from the readme.txt file provided in the
trunk...how to plugin my own annotator in to solr.

Sowmya.

On Thu, Jul 7, 2011 at 11:38 AM, Sowmya V.B.  wrote:

> Hi
>
> I am trying to add UIMA module in to Solr..and began with the readme file
> given here.
>
> https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/README.txt
>
> I am confused about some points in the readme file and hence the email.
>
> 2. modify your schema.xml adding the fields you want to be hold metadata 
> specifying proper values for type, indexed, stored and multiValued options:
>
> -I understood this line as: adding to my schema.xml, the new fields that
> will come as a result of a UIMA pipeline. For example, in my UIMA pipeline,
> post-processing, I get fields A,B,C in addition to fields X,Y,Z that I
> already added to the SolrInputDocument. So, does this mean I should add
> A,B,C to the schema.xml?
>
> 3. In SolrConfig.xml,
>
> inside,
>
> 
> 
>
> if iam not using any of those "alchemy api key..." etc, I think I can
> remove those lines. However, I plan to use the openNLP tagger & tokenizer,
> and an annotator I wrote for my task. Can I give my model file locations
> here as runtimeParameters?
>
> 4. I did not understand what "fieldMapping" tag does. The description said:
> "field mapping describes which features of which types should go in a
> field"--
> - For example, in this snippet from the link:
>
>  
>
>   
>
> -what does "feature" mean and what does "field" mean?
>
>
> I did not understand the fieldmapping tag right and did not find any help
> in previous mails. Hence, mailing the group. Sorry for the long mail!
>
> Regards
> Sowmya V.B.
> 
> Losing optimism is blasphemy!
> http://vbsowmya.wordpress.com
> 
>



-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: can't get moreLikeThis to work

2011-07-07 Thread Juan Grande
Hi Elaine,

The first thing that comes to my mind is that neither the content nor the
term vectors of "text" and "category_text" fields are being stored. Check
the name of the parameter used to store the term vectors, which actually is
"termVectors" and not "term_vectored" (see
http://wiki.apache.org/solr/SchemaXml#Expert_field_options).

Try changing that and tell us if it worked!

Regards,

*Juan*



On Thu, Jul 7, 2011 at 4:44 PM, Elaine Li  wrote:

> Hi Folks,
>
> This is my configuration for mlt in solrconfig.xml
>  class="org.apache.solr.handler.MoreLikeThisHandler">
>
>  name,text,category_text
>  2
>  1
>  3
>  1000
>  50
>  5000
>  true
>  name,text,category_text
>  
>
> 
>
> I also defined the three fields to have term_vectored attribute in
> schema.xml
>  term_vectored="true"/>
>  multiValued="true" term_vectored="true"/>
>  stored="false" multiValued="true" term_vectored="true"/>
>
> When i submit the query
> "http://localhost:8983/solr/mlt?q=id:69134&mlt.count=10";, the return
> only contains one document with id=69134.
>
> Does anyone know or can guess what I missed? Thanks.
>
> Elaine
>


solr replication

2011-07-07 Thread Nolan Frausto
We are looking for a call back to know when replication has finished after
we force a replication using
http://slave_host:port/solr/replication?command=fetchindex. What is the best
way to go about doing this?  We are thinking of forcing the replication then
pulling the command=details page of the slaves to compare its version to
master.

Also any issues that might be involved with this, for instance if there is a
replication going on when we try to force one what happens?


Re: How do I add a custom field?

2011-07-07 Thread Mike Sokolov

Did you ever commit?

On 07/07/2011 01:58 PM, Gabriele Kahlout wrote:

so, how about this:
  Document doc = searcher.doc(i); // i get the doc
 doc.removeField("wc"); // remove the field in case there's
 addWc(doc, docLength); //add the new field
writer.updateDocument(new Term("id", Integer.toString(i++)), doc);
//update the doc

For some reason it doesn't get added to the index. Should it?

On 7/3/11, Michael Sokolov  wrote:
   

You'll need to index the field.  I would think you would want to
index/store the field along with the associated document, in which case
you'll have to reindex the documents as well - there's no single-field
update capability in Lucene (yet?).

-Mike

On 7/3/2011 1:09 PM, Gabriele Kahlout wrote:
 

Is there how I can compute and add the field to all indexed documents
without re-indexing? MyField counts the number of terms per document
(unique
word count).

On Sun, Jul 3, 2011 at 12:24 PM, lee carroll
wrote:

   

Hi Gabriele,
Did you index any docs with your new field ?

The results will just bring back docs and what fields they have. They
won't
bring back "null" fields just because they are in your schema. Lucene
is schema-less.
Solr adds the schema to make it nice to administer and very powerful to
use.





On 3 July 2011 11:01, Gabriele Kahlout   wrote:
 

Hello,

I want to have an additional  field that appears for every document in
search results. I understand that I should do this by adding the field
to
the schema.xml, so I add:
 
Then I restart Solr (so that I loads the new schema.xml) and make a
query
specifying that it should return myField too, but it doesn't. Will it do
only for newly indexed documents? Am I missing something?

--
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
   

time(x)
 

<   Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the
   

email
 

does not contain a valid code then the email is not received. A valid
   

code
 

starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

   


   


 


   


Re: updating documents while keeping unspecified fields

2011-07-07 Thread Juan Grande
Hi Adeel,

As far as I know, this isn't possible yet, but some work is being done:

https://issues.apache.org/jira/browse/SOLR-139
https://issues.apache.org/jira/browse/SOLR-828

Regards,

*Juan*



On Thu, Jul 7, 2011 at 2:24 PM, Adeel Qureshi wrote:

> What I am trying to do is to update a document information while keeping
> data for the fields that arent being specified in the update.
>
> So e.g. if this is the schema
>
> 
> 123
> some title
> active
> 
>
> if i send
>
> 
> 123
> closed
> 
>
> it should update the status to be closed for this document but not wipe out
> title since it wasnt provided in the updated data. Is that possible by
> using
> some flags or something ???
>
> Thanks
> Adeel
>


Re: Need help with troublesome wildcard query

2011-07-07 Thread Briggs Thompson
Hello Christopher,

Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.

>From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.

Briggs Thompson

On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato <
christopher.c...@minimedia.se> wrote:

> Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.
>
> I'm having some problems writing a query that matches a specific field on
> several words. I have implemented an AJAX search that basically takes
> whatever is in a form field and attempts to match documents. I'm not having
> much luck though. First word always matches correctly but as soon as I enter
> the second word I'm loosing matches, the third word doesn't give any matches
> at all.
>
> The title field that I'm searching contains a product name that may or may
> not have several words.
>
> The requirement is that the search should be progressive i.e. as the user
> inputs words I should always return results that contain all of the words
> entered. I also have to correct bad input like an erraneous space in the
> product name ex. "product name" instead of "productname".
>
> I'm wondering if there isn't an easier way to query Solr? Ideally I'd want
> to say "give me all docs that have the following text in it's titles" Is
> that possible?
>
>
> I'd really appreciate any help!
>
>
> Regards,
> Christopher Cato


Re: The correct query syntax for date ?

2011-07-07 Thread duddy67
It works. 
Thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-correct-query-syntax-for-date-tp3147536p3149588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: bug in ExtractingRequestHandler with PDFs and metadata field Category

2011-07-07 Thread Juan Grande
Hi Andras,

I added metadata_ so all PDF metadata fields
> should be saved in solr as "metadata_something" fields.
>
The problem is that the "Category" metadata field from the PDF for some
> reason is not prefixed with "metadata_" and
>
solr will merge the "Category" field I have in the schema with the Category
> metadata from PDF
>

This is the expected behavior, as it's described in
http://wiki.apache.org/solr/ExtractingRequestHandler:

uprefix= - Prefix all fields that are not defined in the schema with
> the given prefix.
>

You can use the fmap parameter to redirect the category metadata to another
field.

Regards,

*Juan*



On Thu, Jul 7, 2011 at 10:44 AM, Andras Balogh  wrote:

> Hi,
>
>I think this is a bug but before reporting to issue tracker I thought I
> will ask it here first.
> So the problem is I have a PDF file which among other metadata fields like
> Author, CreatedDate etc. has a metadata
> field Category (I can see all metadata fields with tika-app.jar started in
> GUI mode).
> Now what happens that in my SOLR schema I have a "Category" field also
> among other fields and a field called "text"
> that is holding the extracted text from the PDF.
> I added metadata_ so all PDF metadata fields
> should be saved in solr as "metadata_something" fields.
> The problem is that the "Category" metadata field from the PDF for some
> reason is not prefixed with "metadata_" and
> solr will merge the "Category" field I have in the schema with the Category
> metadata from PDF and I will have an error like:
> "multiple values encountered for non multiValued field Category"
> I fixed this by patching tika-parsers.jar and will ignore the Category
> metadata in
> org.apache.tika.parser.pdf.**PDFParser
> but this is not the good solution( I don't need that Category metadata so
> it works for me).
>
> So let me know if this should be reported as bug or not.
>
> Regards,
> Andras.
>
>
>
>
>
>
>


can't get moreLikeThis to work

2011-07-07 Thread Elaine Li
Hi Folks,

This is my configuration for mlt in solrconfig.xml


  name,text,category_text
  2
  1
  3
  1000
  50
  5000
  true
  name,text,category_text
  



I also defined the three fields to have term_vectored attribute in schema.xml




When i submit the query
"http://localhost:8983/solr/mlt?q=id:69134&mlt.count=10";, the return
only contains one document with id=69134.

Does anyone know or can guess what I missed? Thanks.

Elaine


Highlight not catching last letter(s)

2011-07-07 Thread Lisa Riggle
Hi Guys!

Thanks for the help with my question regarding special characters in
indexes.  I have another question that I hope you can help with.

Right now, some of our companies have special, non-alphanumeric
characters in them.  Many of these characters get stripped out during
the indexing process and the query process.  Unfortunately, I've
noticed, if the Name of the company that's returned is 1-2 characters
longer than the stripped query string, highlighting will not highlight
the last 1-2 characters.

Example-
Company Name: Inter@ctive
(@ symbol is removed from the tokenwith patternReplaceFilterFactory
during the indexing process)
Search Query: Inter@ctive
(Here the @ symbol is removed by the PHP script before being sent to
Solr, so the term ends up being /Interctive/)
How it gets highlighted: *Inter@ctiv*e

Another time this happens is if I do a search for a company without any
spaces in the name, and it returns a version of the name with spaces in
the name.

Example-
Company Name: Best Buy
(Notice the space)
Search Query: bestbuy
(Notice the lack of space)
How it gets highlighted: *Best Bu*y

I'm at a total loss on how to get around this.  Can anyone point me in
the right direction?


Re: Getting the indexed value rather than the stored value

2011-07-07 Thread Gora Mohanty
On Thu, Jul 7, 2011 at 3:35 AM, Christian  wrote:
[...]
> This is great for finding all things with or without profanity (as separate
> queries), but I would like to get the value as part of a the query and let
> the consumer of the call decide what to do with the data.
>
> Is there a way to do this w/o having to instantiate a KeepWordFilterFactory
> in the Java class that is responsible for inserting the document into Solr?
> For example, I know that I can do this in Java code during the insert, but I
> would rather get the indexed value (the one that shows up when faceting).
>
> Please let me know if this is not clear.

Not sure that I follow what you are after:
* If you are using this field as part of faceting, the facet values should be
  what you are after.
* In the general case, if you want to retrieve the actual value, you should
  have both indexed=true, and stored=true.

Regards,
Gora


Looking for big groups ...

2011-07-07 Thread Benson Margulies
I've got an index set up where there is a field that denotes
membership in a document cluster. By using a grouped query, I can get
a result grouped by cluster membership.

Gosh, I wish I could add one more thing to the top of this pile: sort
by group size. I'd like to have the ability demand sort by descending
group size instead of any more relevant relevance. This is mostly a
debugging trick, not a production mechanism.

Is there a way to do this?


Re: How do I add a custom field?

2011-07-07 Thread Gabriele Kahlout
so, how about this:
 Document doc = searcher.doc(i); // i get the doc
doc.removeField("wc"); // remove the field in case there's
addWc(doc, docLength); //add the new field
writer.updateDocument(new Term("id", Integer.toString(i++)), doc);
//update the doc

For some reason it doesn't get added to the index. Should it?

On 7/3/11, Michael Sokolov  wrote:
> You'll need to index the field.  I would think you would want to
> index/store the field along with the associated document, in which case
> you'll have to reindex the documents as well - there's no single-field
> update capability in Lucene (yet?).
>
> -Mike
>
> On 7/3/2011 1:09 PM, Gabriele Kahlout wrote:
>> Is there how I can compute and add the field to all indexed documents
>> without re-indexing? MyField counts the number of terms per document
>> (unique
>> word count).
>>
>> On Sun, Jul 3, 2011 at 12:24 PM, lee carroll
>> wrote:
>>
>>> Hi Gabriele,
>>> Did you index any docs with your new field ?
>>>
>>> The results will just bring back docs and what fields they have. They
>>> won't
>>> bring back "null" fields just because they are in your schema. Lucene
>>> is schema-less.
>>> Solr adds the schema to make it nice to administer and very powerful to
>>> use.
>>>
>>>
>>>
>>>
>>>
>>> On 3 July 2011 11:01, Gabriele Kahlout  wrote:
 Hello,

 I want to have an additional  field that appears for every document in
 search results. I understand that I should do this by adding the field
 to
 the schema.xml, so I add:
 >>> indexed="false"/>
 Then I restart Solr (so that I loads the new schema.xml) and make a
 query
 specifying that it should return myField too, but it doesn't. Will it do
 only for newly indexed documents? Am I missing something?

 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains "[LON]" or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
>>> time(x)
 <  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the
>>> email
 does not contain a valid code then the email is not received. A valid
>>> code
 starts with a hyphen and ends with "X".
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).

>>
>>
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread Way Cool
Cool. Glad it worked out.

On Thu, Jul 7, 2011 at 11:22 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Thank you very much, I never tried to modify the config files from
> /runtime/local/conf .
>
> In Nutch-0.9, we will just modify from /conf  directory. I
> appreciate your time and help.
>
> Merci
>
> On Thu, Jul 7, 2011 at 12:05 PM, Way Cool  wrote:
>
> > Just make sure you did change the files under
> > /runtime/local/conf if you are running from runtime/local.
> >
> > On Thu, Jul 7, 2011 at 8:34 AM, serenity keningston <
> > serenity.kenings...@gmail.com> wrote:
> >
> > > Hello Friends,
> > >
> > >
> > > I am experiencing this error message " fetcher no agents listed in '
> > > http.agent.name' property" when I am trying to crawl with Nutch 1.3
> > > I referred other mails regarding the same error message and tried to
> > change
> > > the nutch-default.xml and nutch-site.xml file details with
> > >
> > > 
> > >  http.agent.name
> > >  My Nutch Spider
> > >  EMPTY
> > > 
> > >
> > > I also filled out the other property details without blank and still
> > > getting
> > > the same error. May I know my mistake ?
> > >
> > >
> > > Serenity
> > >
> >
>


updating documents while keeping unspecified fields

2011-07-07 Thread Adeel Qureshi
What I am trying to do is to update a document information while keeping
data for the fields that arent being specified in the update.

So e.g. if this is the schema


123
some title
active


if i send


123
closed


it should update the status to be closed for this document but not wipe out
title since it wasnt provided in the updated data. Is that possible by using
some flags or something ???

Thanks
Adeel


Re: fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread serenity keningston
Thank you very much, I never tried to modify the config files from
/runtime/local/conf .

In Nutch-0.9, we will just modify from /conf  directory. I
appreciate your time and help.

Merci

On Thu, Jul 7, 2011 at 12:05 PM, Way Cool  wrote:

> Just make sure you did change the files under
> /runtime/local/conf if you are running from runtime/local.
>
> On Thu, Jul 7, 2011 at 8:34 AM, serenity keningston <
> serenity.kenings...@gmail.com> wrote:
>
> > Hello Friends,
> >
> >
> > I am experiencing this error message " fetcher no agents listed in '
> > http.agent.name' property" when I am trying to crawl with Nutch 1.3
> > I referred other mails regarding the same error message and tried to
> change
> > the nutch-default.xml and nutch-site.xml file details with
> >
> > 
> >  http.agent.name
> >  My Nutch Spider
> >  EMPTY
> > 
> >
> > I also filled out the other property details without blank and still
> > getting
> > the same error. May I know my mistake ?
> >
> >
> > Serenity
> >
>


Re: fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread Way Cool
Just make sure you did change the files under
/runtime/local/conf if you are running from runtime/local.

On Thu, Jul 7, 2011 at 8:34 AM, serenity keningston <
serenity.kenings...@gmail.com> wrote:

> Hello Friends,
>
>
> I am experiencing this error message " fetcher no agents listed in '
> http.agent.name' property" when I am trying to crawl with Nutch 1.3
> I referred other mails regarding the same error message and tried to change
> the nutch-default.xml and nutch-site.xml file details with
>
> 
>  http.agent.name
>  My Nutch Spider
>  EMPTY
> 
>
> I also filled out the other property details without blank and still
> getting
> the same error. May I know my mistake ?
>
>
> Serenity
>


Any way to get the value if sorting by function?

2011-07-07 Thread arian487
Lets say my sort is something like:

sort=sum(indexedField, constant).  If I have a component that runs right
after the QueryComponent, is it possible to know what this value was for
each of the documents IF the field is not stored, and only indexed?  I
scoured through the code and it didn't look like this was possible.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-way-to-get-the-value-if-sorting-by-function-tp3148864p3148864.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tika parser doesn't seem to work with Solr DIH Row Transformer

2011-07-07 Thread abiratsis
Hello there, I am using DIH for importing data from a mysql db and a
directory. For this purpose I have wrote my own Transformer class in order
to modify imported values under several cases. Now we need to add document
support for our indexing server and that leaded us to use Tika in order to
import documents' content. My index server contains data for the following
objects:
 
* Bookmarks

* Courses

* Files (here I need to use Tika)


All the previous elements share some common properties such as: Id, Title,
Description, Text. Also all the needed data are stored to the database and
thats why we decided to use a single DIH mechanism in order to import all
these elements to the Solr index. Of course in the case of the files I need
to read their content. 

So I have wrote something similar to the next code in order to handle
documents' content:


//each file is downloaded first using FTP
FTPClient ftpClient = new FTPClient();
ftpClient.connect("FTPServer");
ftpClient.login("uname", "pass");
File localFile =  new File("/tmp/" + fileName);
ftpClient.download("/repos/files/original/" + fileName,
localFile);
   

InputStream input = new FileInputStream(localFile);
ContentHandler textHandler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();  

AutoDetectParser parser = new AutoDetectParser();
try {
parser.parse(input, textHandler, metadata);
} catch (IOException ex) {
Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
} catch (SAXException ex) {
Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
} catch (TikaException ex) {
Logger.getLogger("SCX.Indexing.Main").log(Level.SEVERE,
null, ex);
}finally{
input.close();
}
row.put("text", textHandler.toString());
row.put("title", metadata.get("title"));


This code is under the transformRow method that my class overrides. 
The problem is that when I run the same code in a main class the code
executes normally but when I move the previous code to the transformRow
method, textHandler.toString() doesn't return any text neither metadata.
Also no exception is thrown!

Has anyone face something similar on the past?

Thanks a lot

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tika-parser-doesn-t-seem-to-work-with-Solr-DIH-Row-Transformer-tp3148853p3148853.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Erick Erickson
You can create a login and edit the wiki, so please do!

Erick

On Thu, Jul 7, 2011 at 12:44 PM, Mark juszczec  wrote:
> First thanks for all the help.
>
> I think the problem was a combination of not having a unique key defined AND
> not including the commit=true parameter in the delta update.
>
> Once I did those things, the delta import left me with a single (updated)
> copy of the record including the changes in the source database.
>
> Do I have write access to the Wiki so I can explicitly state commit=true
> NEEDS to be specified?
>
> Mark
>
>
> On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson 
> wrote:
>
>> I'd restart Solr after changing the schema.xml. The delta import does NOT
>> require restart or anything else like that.
>>
>> The fact that two records are displayed is not what I'd expect. But Solr
>> absolutely handles the replace via . So I suspect that you're
>> not actually doing what you expect. A little-known aid for debugging DIH
>> is solr/admin/dataimport.jsp, that might give you some joy.
>>
>> But, to summarize. This should work fine for DIH as far as Solr is
>> concerned
>> assuming that  is properly defined. In you query above that
>> returns two documents, can you paste the entire response with &fl=*
>> attached?
>> I'm guessing that the data in your index isn't what you're expecting...
>>
>> Also, you might want to get a copy of Luke and examine your index, there's
>> a
>> wealth of infomration
>>
>>
>> Best
>> Erick
>>
>>
>> On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec 
>> wrote:
>> > Erick
>> >
>> > I used to, but now I find I must have commented it out in a fit of rage
>> ;-)
>> >
>> > This could be the whole problem.
>> >
>> > I have verified via admin schema browser that the field is ORDER_ID and
>> will
>> > double check I refer to it in upper case in the appropriate places in the
>> > Solr config scheme.
>> >
>> > Curiously, the admin schema browser display for ORDER_ID says
>> "hasDeletions:
>> > false"  - which seems the opposite of what I want.  I want to be able to
>> > delete duplicates.  Or am I interpreting this field wrong?
>> >
>> > In order to check for duplicates, I am going to using the admin browser
>> to
>> > enter the following in the Make A Query box:
>> >
>> > TABLE_ID:1 AND ORDER_ID:674659
>> >
>> > When I click search and view the results, 2 records are displayed.  One
>> has
>> > the original values, one has the changed values.  I haven't examined the
>> xml
>> > (via view source) too closely and the next time I run I will look for
>> > something indicating one of the records is inactive.
>> >
>> > When you say "change your schema" do you mean via a delta import or by
>> > modifying the config files or both?  FWIW, I am deleting the index on the
>> > file system, doing a full import, modifying the data in the database and
>> > then doing a delta import.
>> >
>> > I am not restarting Solr at all in this process.
>> >
>> > I understand Solr does not perform key management.  You described exactly
>> > what I meant.  Sorry for any confusion.
>> >
>> > Mark
>> >
>> > On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson > >wrote:
>> >
>> >> Let me re-state a few things to see if I've got it right:
>> >>
>> >> > your schema.xml file has an entry like
>> order_id,
>> >> right?
>> >>
>> >> > given this definition, any document added with an order_id that
>> already
>> >> exists in the
>> >>   Solr index will be replaced. i.e. you should have one and only one
>> >> document with a
>> >>   given order_id.
>> >>
>> >> > case matters. Check via the admin page ("schema browser") to see if
>> you
>> >> have
>> >>   two fields, order_id an ORDER_ID.
>> >>
>> >> > How are you checking that your docs are duplicates? If you do a search
>> on
>> >>   order_id, you should get back one and only one document (assuming the
>> >>   definition above). A document that's deleted will just be marked as
>> >> deleted,
>> >>   the data won't be purged from the index. It won't show in search
>> results,
>> >> but
>> >>   it will show if you use lower-level ways to access the data.
>> >>
>> >> > Whenever you change your schema, it's best to clean the index, restart
>> >> the server and
>> >>    re-index from scratch. Solr won't retroactively remove duplicate
>> >>  entries.
>> >>
>> >> > On the stats admin/stats page you should see maxDocs and numDocs. The
>> >> difference
>> >>   between these should be the number of deleted documents.
>> >>
>> >> > Solr doesn't "manage" unique keys. All that happens is Solr will
>> replace
>> >> any
>> >>   pre-existing documents where *you've* defined the  when a
>> >>   new doc is added...
>> >>
>> >> Hope this helps
>> >> Erick
>> >>
>> >> On Thu, Jul 7, 2011 at 10:16 AM, Mark juszczec > >
>> >> wrote:
>> >> > Bob
>> >> >
>> >> > No, I don't.  Let me look into that and post my results.
>> >> >
>> >> > Mark
>> >> >
>> >> >
>> >> > On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford <
>> >> bob.sandif...@sirsidynix.com
>> >> >> wrote:
>> >> >
>> >> >> Hi

Re: ClassCastException launching recent snapshot

2011-07-07 Thread Erick Erickson
Been There, Done That, Got the T-shirt 

Erick

On Thu, Jul 7, 2011 at 12:13 PM, Benson Margulies  wrote:
> I built a fresh set of snapshots myself, I carefully cleaned my
> project, and everything is happy. So this goes down in the department
> of pirate error.
>
> On Thu, Jul 7, 2011 at 8:29 AM, Erick Erickson  
> wrote:
>> Then I would guess that you have other (older) jars in your classpath
>> somewhere. Does the example Solr installation work?
>>
>> Best
>> Erick
>>
>> On Wed, Jul 6, 2011 at 10:21 PM, Benson Margulies  
>> wrote:
>>> Launching solr-4.0-20110705.223601-1.war, I get a class cast exception
>>>
>>> org.apache.lucene.index.DirectoryReader cannot be cast to
>>> org.apache.solr.search.SolrIndexReader with the following backtrace.
>>>
>>> I'm launching solr-as-a-webapp via an embedded copy of tomcat 7. The
>>> location of the index is set up via:
>>>
>>> System.setProperty("solr.data.dir", solrDataDirectory);
>>>
>>> Further, the sources in the corresponding -sources .jar doesn't seem
>>> to have a cast to SolrIndexReader in it anywhere in SolrIndexSearcher.
>>>
>>> SolrIndexSearcher.(SolrCore, IndexSchema, String, IndexReader,
>>> boolean, boolean) line: 142
>>> SolrCore.getSearcher(boolean, boolean, Future[]) line: 1085
>>> SolrCore.(String, String, SolrConfig, IndexSchema,
>>> CoreDescriptor) line: 587
>>> CoreContainer.create(CoreDescriptor) line: 660
>>> CoreContainer.load(String, InputStream) line: 412
>>> CoreContainer$Initializer.initialize() line: 246
>>> SolrDispatchFilter.init(FilterConfig) line: 86
>>> ApplicationFilterConfig.initFilter() line: 273
>>> ApplicationFilterConfig.getFilter() line: 254
>>> ApplicationFilterConfig.setFilterDef(FilterDef) line: 372
>>> ApplicationFilterConfig.(Context, FilterDef) line: 98
>>> StandardContext.filterStart() line: 4584
>>> StandardContext$2.call() line: 5262
>>> StandardContext$2.call() line: 5257
>>> FutureTask$Sync.innerRun() line: 303
>>> FutureTask.run() line: 138
>>> ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
>>>
>>
>


Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Mark juszczec
First thanks for all the help.

I think the problem was a combination of not having a unique key defined AND
not including the commit=true parameter in the delta update.

Once I did those things, the delta import left me with a single (updated)
copy of the record including the changes in the source database.

Do I have write access to the Wiki so I can explicitly state commit=true
NEEDS to be specified?

Mark


On Thu, Jul 7, 2011 at 12:39 PM, Erick Erickson wrote:

> I'd restart Solr after changing the schema.xml. The delta import does NOT
> require restart or anything else like that.
>
> The fact that two records are displayed is not what I'd expect. But Solr
> absolutely handles the replace via . So I suspect that you're
> not actually doing what you expect. A little-known aid for debugging DIH
> is solr/admin/dataimport.jsp, that might give you some joy.
>
> But, to summarize. This should work fine for DIH as far as Solr is
> concerned
> assuming that  is properly defined. In you query above that
> returns two documents, can you paste the entire response with &fl=*
> attached?
> I'm guessing that the data in your index isn't what you're expecting...
>
> Also, you might want to get a copy of Luke and examine your index, there's
> a
> wealth of infomration
>
>
> Best
> Erick
>
>
> On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec 
> wrote:
> > Erick
> >
> > I used to, but now I find I must have commented it out in a fit of rage
> ;-)
> >
> > This could be the whole problem.
> >
> > I have verified via admin schema browser that the field is ORDER_ID and
> will
> > double check I refer to it in upper case in the appropriate places in the
> > Solr config scheme.
> >
> > Curiously, the admin schema browser display for ORDER_ID says
> "hasDeletions:
> > false"  - which seems the opposite of what I want.  I want to be able to
> > delete duplicates.  Or am I interpreting this field wrong?
> >
> > In order to check for duplicates, I am going to using the admin browser
> to
> > enter the following in the Make A Query box:
> >
> > TABLE_ID:1 AND ORDER_ID:674659
> >
> > When I click search and view the results, 2 records are displayed.  One
> has
> > the original values, one has the changed values.  I haven't examined the
> xml
> > (via view source) too closely and the next time I run I will look for
> > something indicating one of the records is inactive.
> >
> > When you say "change your schema" do you mean via a delta import or by
> > modifying the config files or both?  FWIW, I am deleting the index on the
> > file system, doing a full import, modifying the data in the database and
> > then doing a delta import.
> >
> > I am not restarting Solr at all in this process.
> >
> > I understand Solr does not perform key management.  You described exactly
> > what I meant.  Sorry for any confusion.
> >
> > Mark
> >
> > On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson  >wrote:
> >
> >> Let me re-state a few things to see if I've got it right:
> >>
> >> > your schema.xml file has an entry like
> order_id,
> >> right?
> >>
> >> > given this definition, any document added with an order_id that
> already
> >> exists in the
> >>   Solr index will be replaced. i.e. you should have one and only one
> >> document with a
> >>   given order_id.
> >>
> >> > case matters. Check via the admin page ("schema browser") to see if
> you
> >> have
> >>   two fields, order_id an ORDER_ID.
> >>
> >> > How are you checking that your docs are duplicates? If you do a search
> on
> >>   order_id, you should get back one and only one document (assuming the
> >>   definition above). A document that's deleted will just be marked as
> >> deleted,
> >>   the data won't be purged from the index. It won't show in search
> results,
> >> but
> >>   it will show if you use lower-level ways to access the data.
> >>
> >> > Whenever you change your schema, it's best to clean the index, restart
> >> the server and
> >>re-index from scratch. Solr won't retroactively remove duplicate
> >>  entries.
> >>
> >> > On the stats admin/stats page you should see maxDocs and numDocs. The
> >> difference
> >>   between these should be the number of deleted documents.
> >>
> >> > Solr doesn't "manage" unique keys. All that happens is Solr will
> replace
> >> any
> >>   pre-existing documents where *you've* defined the  when a
> >>   new doc is added...
> >>
> >> Hope this helps
> >> Erick
> >>
> >> On Thu, Jul 7, 2011 at 10:16 AM, Mark juszczec  >
> >> wrote:
> >> > Bob
> >> >
> >> > No, I don't.  Let me look into that and post my results.
> >> >
> >> > Mark
> >> >
> >> >
> >> > On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford <
> >> bob.sandif...@sirsidynix.com
> >> >> wrote:
> >> >
> >> >> Hi, Mark.
> >> >>
> >> >> I haven't used DIH myself - so I'll need to leave comments on your
> set
> >> up
> >> >> to others who have done so.
> >> >>
> >> >> Another question - after your initial index create (and after each
> >> delta),
> >> >> do you run a 'commit'?  Do you run

Re: The correct query syntax for date ?

2011-07-07 Thread Erick Erickson
right, you have to escape the ':' in the date, those are Lucene
query syntax characters. Try:
q=datecreation:2001-10-11T00\:00\:00Z

On Thu, Jul 7, 2011 at 10:36 AM, duddy67  wrote:
> I allready tried the format:
>
> q=datecreation:2001-10-11T00:00:00Z
>
> but I still get the same error message.
>
> I use the 1.4.1 version. Is this the reason of my pb ?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/The-correct-query-syntax-for-date-tp3147536p3148384.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Erick Erickson
I'd restart Solr after changing the schema.xml. The delta import does NOT
require restart or anything else like that.

The fact that two records are displayed is not what I'd expect. But Solr
absolutely handles the replace via . So I suspect that you're
not actually doing what you expect. A little-known aid for debugging DIH
is solr/admin/dataimport.jsp, that might give you some joy.

But, to summarize. This should work fine for DIH as far as Solr is concerned
assuming that  is properly defined. In you query above that
returns two documents, can you paste the entire response with &fl=* attached?
I'm guessing that the data in your index isn't what you're expecting...

Also, you might want to get a copy of Luke and examine your index, there's a
wealth of infomration


Best
Erick


On Thu, Jul 7, 2011 at 11:12 AM, Mark juszczec  wrote:
> Erick
>
> I used to, but now I find I must have commented it out in a fit of rage ;-)
>
> This could be the whole problem.
>
> I have verified via admin schema browser that the field is ORDER_ID and will
> double check I refer to it in upper case in the appropriate places in the
> Solr config scheme.
>
> Curiously, the admin schema browser display for ORDER_ID says "hasDeletions:
> false"  - which seems the opposite of what I want.  I want to be able to
> delete duplicates.  Or am I interpreting this field wrong?
>
> In order to check for duplicates, I am going to using the admin browser to
> enter the following in the Make A Query box:
>
> TABLE_ID:1 AND ORDER_ID:674659
>
> When I click search and view the results, 2 records are displayed.  One has
> the original values, one has the changed values.  I haven't examined the xml
> (via view source) too closely and the next time I run I will look for
> something indicating one of the records is inactive.
>
> When you say "change your schema" do you mean via a delta import or by
> modifying the config files or both?  FWIW, I am deleting the index on the
> file system, doing a full import, modifying the data in the database and
> then doing a delta import.
>
> I am not restarting Solr at all in this process.
>
> I understand Solr does not perform key management.  You described exactly
> what I meant.  Sorry for any confusion.
>
> Mark
>
> On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson 
> wrote:
>
>> Let me re-state a few things to see if I've got it right:
>>
>> > your schema.xml file has an entry like order_id,
>> right?
>>
>> > given this definition, any document added with an order_id that already
>> exists in the
>>   Solr index will be replaced. i.e. you should have one and only one
>> document with a
>>   given order_id.
>>
>> > case matters. Check via the admin page ("schema browser") to see if you
>> have
>>   two fields, order_id an ORDER_ID.
>>
>> > How are you checking that your docs are duplicates? If you do a search on
>>   order_id, you should get back one and only one document (assuming the
>>   definition above). A document that's deleted will just be marked as
>> deleted,
>>   the data won't be purged from the index. It won't show in search results,
>> but
>>   it will show if you use lower-level ways to access the data.
>>
>> > Whenever you change your schema, it's best to clean the index, restart
>> the server and
>>    re-index from scratch. Solr won't retroactively remove duplicate
>>  entries.
>>
>> > On the stats admin/stats page you should see maxDocs and numDocs. The
>> difference
>>   between these should be the number of deleted documents.
>>
>> > Solr doesn't "manage" unique keys. All that happens is Solr will replace
>> any
>>   pre-existing documents where *you've* defined the  when a
>>   new doc is added...
>>
>> Hope this helps
>> Erick
>>
>> On Thu, Jul 7, 2011 at 10:16 AM, Mark juszczec 
>> wrote:
>> > Bob
>> >
>> > No, I don't.  Let me look into that and post my results.
>> >
>> > Mark
>> >
>> >
>> > On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford <
>> bob.sandif...@sirsidynix.com
>> >> wrote:
>> >
>> >> Hi, Mark.
>> >>
>> >> I haven't used DIH myself - so I'll need to leave comments on your set
>> up
>> >> to others who have done so.
>> >>
>> >> Another question - after your initial index create (and after each
>> delta),
>> >> do you run a 'commit'?  Do you run an 'optimize'?  (Without the
>> optimize,
>> >> 'deleted' records still show up in query results...)
>> >>
>> >> Bob Sandiford | Lead Software Engineer | SirsiDynix
>> >> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
>> >> www.sirsidynix.com
>> >>
>> >>
>> >> > -Original Message-
>> >> > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
>> >> > Sent: Thursday, July 07, 2011 10:04 AM
>> >> > To: solr-user@lucene.apache.org
>> >> > Subject: Re: updating existing data in index vs inserting new data in
>> >> > index
>> >> >
>> >> > Bob
>> >> >
>> >> > Thanks very much for the reply!
>> >> >
>> >> > I am using a unique integer called order_id as the Solr index key.
>> >> >
>> >> > My query, deltaQuery and deltaImportQue

Need help with troublesome wildcard query

2011-07-07 Thread Christopher Cato
Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

I'm having some problems writing a query that matches a specific field on 
several words. I have implemented an AJAX search that basically takes whatever 
is in a form field and attempts to match documents. I'm not having much luck 
though. First word always matches correctly but as soon as I enter the second 
word I'm loosing matches, the third word doesn't give any matches at all.

The title field that I'm searching contains a product name that may or may not 
have several words.

The requirement is that the search should be progressive i.e. as the user 
inputs words I should always return results that contain all of the words 
entered. I also have to correct bad input like an erraneous space in the 
product name ex. "product name" instead of "productname".

I'm wondering if there isn't an easier way to query Solr? Ideally I'd want to 
say "give me all docs that have the following text in it's titles" Is that 
possible?


I'd really appreciate any help!


Regards,
Christopher Cato

Re: ClassCastException launching recent snapshot

2011-07-07 Thread Benson Margulies
I built a fresh set of snapshots myself, I carefully cleaned my
project, and everything is happy. So this goes down in the department
of pirate error.

On Thu, Jul 7, 2011 at 8:29 AM, Erick Erickson  wrote:
> Then I would guess that you have other (older) jars in your classpath
> somewhere. Does the example Solr installation work?
>
> Best
> Erick
>
> On Wed, Jul 6, 2011 at 10:21 PM, Benson Margulies  
> wrote:
>> Launching solr-4.0-20110705.223601-1.war, I get a class cast exception
>>
>> org.apache.lucene.index.DirectoryReader cannot be cast to
>> org.apache.solr.search.SolrIndexReader with the following backtrace.
>>
>> I'm launching solr-as-a-webapp via an embedded copy of tomcat 7. The
>> location of the index is set up via:
>>
>> System.setProperty("solr.data.dir", solrDataDirectory);
>>
>> Further, the sources in the corresponding -sources .jar doesn't seem
>> to have a cast to SolrIndexReader in it anywhere in SolrIndexSearcher.
>>
>> SolrIndexSearcher.(SolrCore, IndexSchema, String, IndexReader,
>> boolean, boolean) line: 142
>> SolrCore.getSearcher(boolean, boolean, Future[]) line: 1085
>> SolrCore.(String, String, SolrConfig, IndexSchema,
>> CoreDescriptor) line: 587
>> CoreContainer.create(CoreDescriptor) line: 660
>> CoreContainer.load(String, InputStream) line: 412
>> CoreContainer$Initializer.initialize() line: 246
>> SolrDispatchFilter.init(FilterConfig) line: 86
>> ApplicationFilterConfig.initFilter() line: 273
>> ApplicationFilterConfig.getFilter() line: 254
>> ApplicationFilterConfig.setFilterDef(FilterDef) line: 372
>> ApplicationFilterConfig.(Context, FilterDef) line: 98
>> StandardContext.filterStart() line: 4584
>> StandardContext$2.call() line: 5262
>> StandardContext$2.call() line: 5257
>> FutureTask$Sync.innerRun() line: 303
>> FutureTask.run() line: 138
>> ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
>>
>


Any chance of getting SOLR-949 into the application

2011-07-07 Thread Will Milspec
hi all,

Our applications requires term vectors and uses SOLR-949 solrj patch to
simplify the client layer. This patch eliminates the need to manually parse
the xml returned by the tvrh (term vector response handler)
   https://issues.apache.org/jira/browse/SOLR-949

Can we get this in the head/trunk?

Re-patching after each solr upgrade is a bit error prone.

thanks

will


Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Mark juszczec
Erick

I used to, but now I find I must have commented it out in a fit of rage ;-)

This could be the whole problem.

I have verified via admin schema browser that the field is ORDER_ID and will
double check I refer to it in upper case in the appropriate places in the
Solr config scheme.

Curiously, the admin schema browser display for ORDER_ID says "hasDeletions:
false"  - which seems the opposite of what I want.  I want to be able to
delete duplicates.  Or am I interpreting this field wrong?

In order to check for duplicates, I am going to using the admin browser to
enter the following in the Make A Query box:

TABLE_ID:1 AND ORDER_ID:674659

When I click search and view the results, 2 records are displayed.  One has
the original values, one has the changed values.  I haven't examined the xml
(via view source) too closely and the next time I run I will look for
something indicating one of the records is inactive.

When you say "change your schema" do you mean via a delta import or by
modifying the config files or both?  FWIW, I am deleting the index on the
file system, doing a full import, modifying the data in the database and
then doing a delta import.

I am not restarting Solr at all in this process.

I understand Solr does not perform key management.  You described exactly
what I meant.  Sorry for any confusion.

Mark

On Thu, Jul 7, 2011 at 10:52 AM, Erick Erickson wrote:

> Let me re-state a few things to see if I've got it right:
>
> > your schema.xml file has an entry like order_id,
> right?
>
> > given this definition, any document added with an order_id that already
> exists in the
>   Solr index will be replaced. i.e. you should have one and only one
> document with a
>   given order_id.
>
> > case matters. Check via the admin page ("schema browser") to see if you
> have
>   two fields, order_id an ORDER_ID.
>
> > How are you checking that your docs are duplicates? If you do a search on
>   order_id, you should get back one and only one document (assuming the
>   definition above). A document that's deleted will just be marked as
> deleted,
>   the data won't be purged from the index. It won't show in search results,
> but
>   it will show if you use lower-level ways to access the data.
>
> > Whenever you change your schema, it's best to clean the index, restart
> the server and
>re-index from scratch. Solr won't retroactively remove duplicate
>  entries.
>
> > On the stats admin/stats page you should see maxDocs and numDocs. The
> difference
>   between these should be the number of deleted documents.
>
> > Solr doesn't "manage" unique keys. All that happens is Solr will replace
> any
>   pre-existing documents where *you've* defined the  when a
>   new doc is added...
>
> Hope this helps
> Erick
>
> On Thu, Jul 7, 2011 at 10:16 AM, Mark juszczec 
> wrote:
> > Bob
> >
> > No, I don't.  Let me look into that and post my results.
> >
> > Mark
> >
> >
> > On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford <
> bob.sandif...@sirsidynix.com
> >> wrote:
> >
> >> Hi, Mark.
> >>
> >> I haven't used DIH myself - so I'll need to leave comments on your set
> up
> >> to others who have done so.
> >>
> >> Another question - after your initial index create (and after each
> delta),
> >> do you run a 'commit'?  Do you run an 'optimize'?  (Without the
> optimize,
> >> 'deleted' records still show up in query results...)
> >>
> >> Bob Sandiford | Lead Software Engineer | SirsiDynix
> >> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> >> www.sirsidynix.com
> >>
> >>
> >> > -Original Message-
> >> > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> >> > Sent: Thursday, July 07, 2011 10:04 AM
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Re: updating existing data in index vs inserting new data in
> >> > index
> >> >
> >> > Bob
> >> >
> >> > Thanks very much for the reply!
> >> >
> >> > I am using a unique integer called order_id as the Solr index key.
> >> >
> >> > My query, deltaQuery and deltaImportQuery are below:
> >> >
> >> >  >> >   pk="ORDER_ID"
> >> >   query="select 1 as TABLE_ID , orders.order_id,
> >> > orders.order_booked_ind,
> >> > orders.order_dt, orders.cancel_dt, orders.account_manager_id,
> >> > orders.of_header_id, orders.order_status_lov_id, orders.order_type_id,
> >> > orders.approved_discount_pct, orders.campaign_nm,
> >> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from
> >> > orders"
> >> >
> >> >   deltaImportQuery="select 1 as TABLE_ID, orders.order_id,
> >> > orders.order_booked_ind, orders.order_dt, orders.cancel_dt,
> >> > orders.account_manager_id, orders.of_header_id,
> >> > orders.order_status_lov_id,
> >> > orders.order_type_id, orders.approved_discount_pct,
> orders.campaign_nm,
> >> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from
> orders
> >> > where orders.order_id = '${dataimporter.delta.ORDER_ID}'"
> >> >
> >> >   deltaQuery="select orders.order_id from orders where
> orders.change_dt
> >> > >
> >> > to_

Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Michael Kuhlmann
Am 07.07.2011 16:52, schrieb Mark juszczec:
> Ok.  That's really good to know because optimization of that kind will be
> important.

Optimization is only important if you had a lot of deletes or updated
docs, or if you want your segments get merged. (At least that's what I
know about it.)
> 
> What of commit?  Does it somehow remove the previous version of an updated
> record?

"Somehow", yes. If you don't commit, your changes won't be visible, and
the old documents remain unchanged. Physically they stay in the index
and are purged on optimize, but that's just an implementation detail.

-Kuli


Query does not work when changing param order

2011-07-07 Thread Juan Manuel Alvarez
Hi everyone!

I would like to ask you a question about a problem I am facing with a
Solr query.

I have a field "tags" of type "textgen" and some documents with the
values "myothertag,mytag".

When I use the query:
/solr/select?sort=name_sort+asc&start=0&qf=tags&q.alt=*:*&fq={!field
q.op=AND f=tags}myothertag mytag&rows=60&defType=dismax

everything works as expected, but if I change the order of the
parameters in the fq, like this
/solr/select?sort=name_sort+asc&start=0&qf=tags&q.alt=*:*&fq={!field
q.op=AND f=tags}mytag myothertag&rows=60&defType=dismax
I get no results.

As far as I have seen, the "textgen" fieldshould tokenize the words in
the field, so if I use comma-separated values, like in my example,
both words are going to be indexed.

Can anyone please point me in the right direction?

Cheers!
Juan M.


Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Mark juszczec
Ok.  That's really good to know because optimization of that kind will be
important.

What of commit?  Does it somehow remove the previous version of an updated
record?

On Thu, Jul 7, 2011 at 10:49 AM, Michael Kuhlmann  wrote:

> Am 07.07.2011 16:14, schrieb Bob Sandiford:
> > [...] (Without the optimize, 'deleted' records still show up in query
> results...)
>
> No, that's not true. The terms remain in the index, but the document
> won't show up any more.
>
> Optimize is only for performance (and disk space) optimization, as the
> name suggests.
>
> -Kuli
>


Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Erick Erickson
Let me re-state a few things to see if I've got it right:

> your schema.xml file has an entry like order_id, right?

> given this definition, any document added with an order_id that already 
> exists in the
   Solr index will be replaced. i.e. you should have one and only one
document with a
   given order_id.

> case matters. Check via the admin page ("schema browser") to see if you have
   two fields, order_id an ORDER_ID.

> How are you checking that your docs are duplicates? If you do a search on
   order_id, you should get back one and only one document (assuming the
   definition above). A document that's deleted will just be marked as deleted,
   the data won't be purged from the index. It won't show in search results, but
   it will show if you use lower-level ways to access the data.

> Whenever you change your schema, it's best to clean the index, restart the 
> server and
re-index from scratch. Solr won't retroactively remove duplicate
 entries.

> On the stats admin/stats page you should see maxDocs and numDocs. The 
> difference
   between these should be the number of deleted documents.

> Solr doesn't "manage" unique keys. All that happens is Solr will replace any
   pre-existing documents where *you've* defined the  when a
   new doc is added...

Hope this helps
Erick

On Thu, Jul 7, 2011 at 10:16 AM, Mark juszczec  wrote:
> Bob
>
> No, I don't.  Let me look into that and post my results.
>
> Mark
>
>
> On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford > wrote:
>
>> Hi, Mark.
>>
>> I haven't used DIH myself - so I'll need to leave comments on your set up
>> to others who have done so.
>>
>> Another question - after your initial index create (and after each delta),
>> do you run a 'commit'?  Do you run an 'optimize'?  (Without the optimize,
>> 'deleted' records still show up in query results...)
>>
>> Bob Sandiford | Lead Software Engineer | SirsiDynix
>> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
>> www.sirsidynix.com
>>
>>
>> > -Original Message-
>> > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
>> > Sent: Thursday, July 07, 2011 10:04 AM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: updating existing data in index vs inserting new data in
>> > index
>> >
>> > Bob
>> >
>> > Thanks very much for the reply!
>> >
>> > I am using a unique integer called order_id as the Solr index key.
>> >
>> > My query, deltaQuery and deltaImportQuery are below:
>> >
>> > > >   pk="ORDER_ID"
>> >   query="select 1 as TABLE_ID , orders.order_id,
>> > orders.order_booked_ind,
>> > orders.order_dt, orders.cancel_dt,     orders.account_manager_id,
>> > orders.of_header_id, orders.order_status_lov_id, orders.order_type_id,
>> > orders.approved_discount_pct, orders.campaign_nm,
>> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from
>> > orders"
>> >
>> >   deltaImportQuery="select 1 as TABLE_ID, orders.order_id,
>> > orders.order_booked_ind, orders.order_dt, orders.cancel_dt,
>> > orders.account_manager_id, orders.of_header_id,
>> > orders.order_status_lov_id,
>> > orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm,
>> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from orders
>> > where orders.order_id = '${dataimporter.delta.ORDER_ID}'"
>> >
>> >   deltaQuery="select orders.order_id from orders where orders.change_dt
>> > >
>> > to_date('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS')" >
>> >         
>> >
>> > The test I am running is two part:
>> >
>> > 1.  After I do a full import of the index, I insert a brand new record
>> > (with
>> > a never existed before order_id) in the database.  The delta import
>> > picks
>> > this up just fine.
>> >
>> > 2.  After the full import, I modify a record with an order_id that
>> > already
>> > shows up in the index.  I have verified there is only one record with
>> > this
>> > order_id in both the index and the db before I do the delta update.
>> >
>> > I guess the question is, am I screwing myself up by defining my own Solr
>> > index key?  I want to, ultimately, be able to search on ORDER_ID in the
>> > Solr
>> > index.  However, the docs say (I think) a field does not have to be the
>> > Solr
>> > primary key in order to be searchable.  Would I be better off letting
>> > Solr
>> > manage the keys?
>> >
>> > Mark
>> >
>> > On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford
>> > wrote:
>> >
>> > > What are you using as the unique id in your Solr index?  It sounds
>> > like you
>> > > may have one value as your Solr index unique id, which bears no
>> > resemblance
>> > > to a unique[1] id derived from your data...
>> > >
>> > > Or - another way to put it - what is it that makes these two records
>> > in
>> > > your Solr index 'the same', and what are the unique id's for those two
>> > > entries in the Solr index?  How are those id's related to your
>> > original
>> > > data?
>> > >
>> > > [1] not only unique, but immutable.  I.E. if you update a row in your
>> > > database, the 

Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Michael Kuhlmann
Am 07.07.2011 16:14, schrieb Bob Sandiford:
> [...] (Without the optimize, 'deleted' records still show up in query 
> results...)

No, that's not true. The terms remain in the index, but the document
won't show up any more.

Optimize is only for performance (and disk space) optimization, as the
name suggests.

-Kuli


Re: The correct query syntax for date ?

2011-07-07 Thread duddy67
I allready tried the format: 

q=datecreation:2001-10-11T00:00:00Z

but I still get the same error message.

I use the 1.4.1 version. Is this the reason of my pb ? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-correct-query-syntax-for-date-tp3147536p3148384.html
Sent from the Solr - User mailing list archive at Nabble.com.


fetcher no agents listed in 'http.agent.name' property

2011-07-07 Thread serenity keningston
Hello Friends,


I am experiencing this error message " fetcher no agents listed in '
http.agent.name' property" when I am trying to crawl with Nutch 1.3
I referred other mails regarding the same error message and tried to change
the nutch-default.xml and nutch-site.xml file details with


  http.agent.name
  My Nutch Spider
  EMPTY


I also filled out the other property details without blank and still getting
the same error. May I know my mistake ?


Serenity


Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Mark juszczec
Bob

No, I don't.  Let me look into that and post my results.

Mark


On Thu, Jul 7, 2011 at 10:14 AM, Bob Sandiford  wrote:

> Hi, Mark.
>
> I haven't used DIH myself - so I'll need to leave comments on your set up
> to others who have done so.
>
> Another question - after your initial index create (and after each delta),
> do you run a 'commit'?  Do you run an 'optimize'?  (Without the optimize,
> 'deleted' records still show up in query results...)
>
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> www.sirsidynix.com
>
>
> > -Original Message-
> > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> > Sent: Thursday, July 07, 2011 10:04 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: updating existing data in index vs inserting new data in
> > index
> >
> > Bob
> >
> > Thanks very much for the reply!
> >
> > I am using a unique integer called order_id as the Solr index key.
> >
> > My query, deltaQuery and deltaImportQuery are below:
> >
> >  >   pk="ORDER_ID"
> >   query="select 1 as TABLE_ID , orders.order_id,
> > orders.order_booked_ind,
> > orders.order_dt, orders.cancel_dt, orders.account_manager_id,
> > orders.of_header_id, orders.order_status_lov_id, orders.order_type_id,
> > orders.approved_discount_pct, orders.campaign_nm,
> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from
> > orders"
> >
> >   deltaImportQuery="select 1 as TABLE_ID, orders.order_id,
> > orders.order_booked_ind, orders.order_dt, orders.cancel_dt,
> > orders.account_manager_id, orders.of_header_id,
> > orders.order_status_lov_id,
> > orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm,
> > orders.approved_by_cd,orders.advertiser_id, orders.agency_id from orders
> > where orders.order_id = '${dataimporter.delta.ORDER_ID}'"
> >
> >   deltaQuery="select orders.order_id from orders where orders.change_dt
> > >
> > to_date('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS')" >
> > 
> >
> > The test I am running is two part:
> >
> > 1.  After I do a full import of the index, I insert a brand new record
> > (with
> > a never existed before order_id) in the database.  The delta import
> > picks
> > this up just fine.
> >
> > 2.  After the full import, I modify a record with an order_id that
> > already
> > shows up in the index.  I have verified there is only one record with
> > this
> > order_id in both the index and the db before I do the delta update.
> >
> > I guess the question is, am I screwing myself up by defining my own Solr
> > index key?  I want to, ultimately, be able to search on ORDER_ID in the
> > Solr
> > index.  However, the docs say (I think) a field does not have to be the
> > Solr
> > primary key in order to be searchable.  Would I be better off letting
> > Solr
> > manage the keys?
> >
> > Mark
> >
> > On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford
> > wrote:
> >
> > > What are you using as the unique id in your Solr index?  It sounds
> > like you
> > > may have one value as your Solr index unique id, which bears no
> > resemblance
> > > to a unique[1] id derived from your data...
> > >
> > > Or - another way to put it - what is it that makes these two records
> > in
> > > your Solr index 'the same', and what are the unique id's for those two
> > > entries in the Solr index?  How are those id's related to your
> > original
> > > data?
> > >
> > > [1] not only unique, but immutable.  I.E. if you update a row in your
> > > database, the unique id derived from that row has to be the same as it
> > would
> > > have been before the update.  Otherwise, there's nothing for Solr to
> > > recognize as a duplicate entry, and do a 'delete' and 'insert' instead
> > of
> > > just an 'insert'.
> > >
> > > Bob Sandiford | Lead Software Engineer | SirsiDynix
> > > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> > > www.sirsidynix.com
> > >
> > >
> > > > -Original Message-
> > > > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> > > > Sent: Thursday, July 07, 2011 9:15 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: updating existing data in index vs inserting new data in
> > index
> > > >
> > > > Hello all
> > > >
> > > > I'm using Solr 3.2 and am confused about updating existing data in
> > an
> > > > index.
> > > >
> > > > According to the DataImportHandler Wiki:
> > > >
> > > > *"delta-import* : For incremental imports and change detection run
> > the
> > > > command `http://:/solr/dataimport?command=delta-import .
> > It
> > > > supports the same clean, commit, optimize and debug parameters as
> > > > full-import command."
> > > >
> > > > I know delta-import will find new data in the database and insert it
> > > > into
> > > > the index.  My problem is how it handles updates where I've got a
> > record
> > > > that exists in the index and the database, the database record is
> > > > changed
> > > > and I want to incorporate those changes in the existing record i

RE: updating existing data in index vs inserting new data in index

2011-07-07 Thread Bob Sandiford
Hi, Mark.

I haven't used DIH myself - so I'll need to leave comments on your set up to 
others who have done so.

Another question - after your initial index create (and after each delta), do 
you run a 'commit'?  Do you run an 'optimize'?  (Without the optimize, 
'deleted' records still show up in query results...)

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


> -Original Message-
> From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> Sent: Thursday, July 07, 2011 10:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: updating existing data in index vs inserting new data in
> index
> 
> Bob
> 
> Thanks very much for the reply!
> 
> I am using a unique integer called order_id as the Solr index key.
> 
> My query, deltaQuery and deltaImportQuery are below:
> 
>pk="ORDER_ID"
>   query="select 1 as TABLE_ID , orders.order_id,
> orders.order_booked_ind,
> orders.order_dt, orders.cancel_dt, orders.account_manager_id,
> orders.of_header_id, orders.order_status_lov_id, orders.order_type_id,
> orders.approved_discount_pct, orders.campaign_nm,
> orders.approved_by_cd,orders.advertiser_id, orders.agency_id from
> orders"
> 
>   deltaImportQuery="select 1 as TABLE_ID, orders.order_id,
> orders.order_booked_ind, orders.order_dt, orders.cancel_dt,
> orders.account_manager_id, orders.of_header_id,
> orders.order_status_lov_id,
> orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm,
> orders.approved_by_cd,orders.advertiser_id, orders.agency_id from orders
> where orders.order_id = '${dataimporter.delta.ORDER_ID}'"
> 
>   deltaQuery="select orders.order_id from orders where orders.change_dt
> >
> to_date('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS')" >
> 
> 
> The test I am running is two part:
> 
> 1.  After I do a full import of the index, I insert a brand new record
> (with
> a never existed before order_id) in the database.  The delta import
> picks
> this up just fine.
> 
> 2.  After the full import, I modify a record with an order_id that
> already
> shows up in the index.  I have verified there is only one record with
> this
> order_id in both the index and the db before I do the delta update.
> 
> I guess the question is, am I screwing myself up by defining my own Solr
> index key?  I want to, ultimately, be able to search on ORDER_ID in the
> Solr
> index.  However, the docs say (I think) a field does not have to be the
> Solr
> primary key in order to be searchable.  Would I be better off letting
> Solr
> manage the keys?
> 
> Mark
> 
> On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford
> wrote:
> 
> > What are you using as the unique id in your Solr index?  It sounds
> like you
> > may have one value as your Solr index unique id, which bears no
> resemblance
> > to a unique[1] id derived from your data...
> >
> > Or - another way to put it - what is it that makes these two records
> in
> > your Solr index 'the same', and what are the unique id's for those two
> > entries in the Solr index?  How are those id's related to your
> original
> > data?
> >
> > [1] not only unique, but immutable.  I.E. if you update a row in your
> > database, the unique id derived from that row has to be the same as it
> would
> > have been before the update.  Otherwise, there's nothing for Solr to
> > recognize as a duplicate entry, and do a 'delete' and 'insert' instead
> of
> > just an 'insert'.
> >
> > Bob Sandiford | Lead Software Engineer | SirsiDynix
> > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> > www.sirsidynix.com
> >
> >
> > > -Original Message-
> > > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> > > Sent: Thursday, July 07, 2011 9:15 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: updating existing data in index vs inserting new data in
> index
> > >
> > > Hello all
> > >
> > > I'm using Solr 3.2 and am confused about updating existing data in
> an
> > > index.
> > >
> > > According to the DataImportHandler Wiki:
> > >
> > > *"delta-import* : For incremental imports and change detection run
> the
> > > command `http://:/solr/dataimport?command=delta-import .
> It
> > > supports the same clean, commit, optimize and debug parameters as
> > > full-import command."
> > >
> > > I know delta-import will find new data in the database and insert it
> > > into
> > > the index.  My problem is how it handles updates where I've got a
> record
> > > that exists in the index and the database, the database record is
> > > changed
> > > and I want to incorporate those changes in the existing record in
> the
> > > index.
> > >  IOW I don't want to insert it again.
> > >
> > > I've tried this and wound up with 2 records with the same key in the
> > > index.
> > >  The first contains the original db values found when the index was
> > > created,
> > > the 2nd contains the db values after the record was changed.
> > >
> > > I've also found this
> > >
> http://search.luci

Re: updating existing data in index vs inserting new data in index

2011-07-07 Thread Mark juszczec
Bob

Thanks very much for the reply!

I am using a unique integer called order_id as the Solr index key.

My query, deltaQuery and deltaImportQuery are below:




The test I am running is two part:

1.  After I do a full import of the index, I insert a brand new record (with
a never existed before order_id) in the database.  The delta import picks
this up just fine.

2.  After the full import, I modify a record with an order_id that already
shows up in the index.  I have verified there is only one record with this
order_id in both the index and the db before I do the delta update.

I guess the question is, am I screwing myself up by defining my own Solr
index key?  I want to, ultimately, be able to search on ORDER_ID in the Solr
index.  However, the docs say (I think) a field does not have to be the Solr
primary key in order to be searchable.  Would I be better off letting Solr
manage the keys?

Mark

On Thu, Jul 7, 2011 at 9:24 AM, Bob Sandiford
wrote:

> What are you using as the unique id in your Solr index?  It sounds like you
> may have one value as your Solr index unique id, which bears no resemblance
> to a unique[1] id derived from your data...
>
> Or - another way to put it - what is it that makes these two records in
> your Solr index 'the same', and what are the unique id's for those two
> entries in the Solr index?  How are those id's related to your original
> data?
>
> [1] not only unique, but immutable.  I.E. if you update a row in your
> database, the unique id derived from that row has to be the same as it would
> have been before the update.  Otherwise, there's nothing for Solr to
> recognize as a duplicate entry, and do a 'delete' and 'insert' instead of
> just an 'insert'.
>
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> www.sirsidynix.com
>
>
> > -Original Message-
> > From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> > Sent: Thursday, July 07, 2011 9:15 AM
> > To: solr-user@lucene.apache.org
> > Subject: updating existing data in index vs inserting new data in index
> >
> > Hello all
> >
> > I'm using Solr 3.2 and am confused about updating existing data in an
> > index.
> >
> > According to the DataImportHandler Wiki:
> >
> > *"delta-import* : For incremental imports and change detection run the
> > command `http://:/solr/dataimport?command=delta-import . It
> > supports the same clean, commit, optimize and debug parameters as
> > full-import command."
> >
> > I know delta-import will find new data in the database and insert it
> > into
> > the index.  My problem is how it handles updates where I've got a record
> > that exists in the index and the database, the database record is
> > changed
> > and I want to incorporate those changes in the existing record in the
> > index.
> >  IOW I don't want to insert it again.
> >
> > I've tried this and wound up with 2 records with the same key in the
> > index.
> >  The first contains the original db values found when the index was
> > created,
> > the 2nd contains the db values after the record was changed.
> >
> > I've also found this
> > http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720
> > 66.n3.nabble.com%2FDelta-import-with-solrj-client-tp1085763p1086173.html
> > the
> > subject is 'Delta-import with solrj client'
> >
> > "Greetings. I have a *solrj* client for fetching data from database. I
> > am
> > using *delta*-*import* for fetching data. If a column is changed in
> > database
> > using timestamp with *delta*-*import* i get the latest column indexed
> > but
> > there are *duplicate* values in the index similar to the column but the
> > data
> > is older. This works with cleaning the index but i want to update the
> > index
> > without cleaning it. Is there a way to just update the index with the
> > updated column without having *duplicate* values. Appreciate for any
> > feedback.
> >
> > Hando"
> >
> > There are 2 responses:
> >
> > "Short answer is no, there isn't a way. *Solr* doesn't have the concept
> > of
> > 'Update' to an indexed document. You need to add the full document (all
> > 'columns') each time any one field changes. If doing that in your
> > DataImportHandler logic is difficult you may need to write a separate
> > Update
> > Service that does:
> >
> > 1) Read UniqueID, UpdatedColumn(s)  from database
> > 2) Using UniqueID Retrieve document from *Solr*
> > 3) Add/Update field(s) with updated column(s)
> > 4) Add document back to *Solr*
> >
> > Although, if you use DIH to do a full *import*, using the same query in
> > your *Delta*-*Import* to get the whole document shouldn't be that
> > difficult."
> >
> > and
> >
> > "Hi,
> >
> > Make sure you use a proper "ID" field, which does *not* change even if
> > the
> > content in the database changes. In this way, when your
> > *delta*-*import* fetches
> > changed rows to index, they will update the existing rows in your index.
> > "
> >
> > I have an ID field that 

Re: the version of a Lucene index changes after an optimize?

2011-07-07 Thread gquaire
Thank you Erik for the information you gave me.

I will test the version of the index in order to know when I need to refresh
the component.

Best Regards,

gquaire

-
Jouve ITS France
--
View this message in context: 
http://lucene.472066.n3.nabble.com/the-version-of-a-Lucene-index-changes-after-an-optimize-tp3143822p3148275.html
Sent from the Solr - User mailing list archive at Nabble.com.


bug in ExtractingRequestHandler with PDFs and metadata field Category

2011-07-07 Thread Andras Balogh

Hi,

I think this is a bug but before reporting to issue tracker I 
thought I will ask it here first.
So the problem is I have a PDF file which among other metadata fields 
like Author, CreatedDate etc. has a metadata
field Category (I can see all metadata fields with tika-app.jar started 
in GUI mode).
Now what happens that in my SOLR schema I have a "Category" field also 
among other fields and a field called "text"

that is holding the extracted text from the PDF.
I added metadata_ so all PDF metadata fields 
should be saved in solr as "metadata_something" fields.
The problem is that the "Category" metadata field from the PDF for some 
reason is not prefixed with "metadata_" and
solr will merge the "Category" field I have in the schema with the 
Category metadata from PDF and I will have an error like:

"multiple values encountered for non multiValued field Category"
I fixed this by patching tika-parsers.jar and will ignore the Category 
metadata in

org.apache.tika.parser.pdf.PDFParser
but this is not the good solution( I don't need that Category metadata 
so it works for me).


So let me know if this should be reported as bug or not.

Regards,
Andras.








RE: updating existing data in index vs inserting new data in index

2011-07-07 Thread Bob Sandiford
What are you using as the unique id in your Solr index?  It sounds like you may 
have one value as your Solr index unique id, which bears no resemblance to a 
unique[1] id derived from your data...

Or - another way to put it - what is it that makes these two records in your 
Solr index 'the same', and what are the unique id's for those two entries in 
the Solr index?  How are those id's related to your original data?

[1] not only unique, but immutable.  I.E. if you update a row in your database, 
the unique id derived from that row has to be the same as it would have been 
before the update.  Otherwise, there's nothing for Solr to recognize as a 
duplicate entry, and do a 'delete' and 'insert' instead of just an 'insert'.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com


> -Original Message-
> From: Mark juszczec [mailto:mark.juszc...@gmail.com]
> Sent: Thursday, July 07, 2011 9:15 AM
> To: solr-user@lucene.apache.org
> Subject: updating existing data in index vs inserting new data in index
> 
> Hello all
> 
> I'm using Solr 3.2 and am confused about updating existing data in an
> index.
> 
> According to the DataImportHandler Wiki:
> 
> *"delta-import* : For incremental imports and change detection run the
> command `http://:/solr/dataimport?command=delta-import . It
> supports the same clean, commit, optimize and debug parameters as
> full-import command."
> 
> I know delta-import will find new data in the database and insert it
> into
> the index.  My problem is how it handles updates where I've got a record
> that exists in the index and the database, the database record is
> changed
> and I want to incorporate those changes in the existing record in the
> index.
>  IOW I don't want to insert it again.
> 
> I've tried this and wound up with 2 records with the same key in the
> index.
>  The first contains the original db values found when the index was
> created,
> the 2nd contains the db values after the record was changed.
> 
> I've also found this
> http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.4720
> 66.n3.nabble.com%2FDelta-import-with-solrj-client-tp1085763p1086173.html
> the
> subject is 'Delta-import with solrj client'
> 
> "Greetings. I have a *solrj* client for fetching data from database. I
> am
> using *delta*-*import* for fetching data. If a column is changed in
> database
> using timestamp with *delta*-*import* i get the latest column indexed
> but
> there are *duplicate* values in the index similar to the column but the
> data
> is older. This works with cleaning the index but i want to update the
> index
> without cleaning it. Is there a way to just update the index with the
> updated column without having *duplicate* values. Appreciate for any
> feedback.
> 
> Hando"
> 
> There are 2 responses:
> 
> "Short answer is no, there isn't a way. *Solr* doesn't have the concept
> of
> 'Update' to an indexed document. You need to add the full document (all
> 'columns') each time any one field changes. If doing that in your
> DataImportHandler logic is difficult you may need to write a separate
> Update
> Service that does:
> 
> 1) Read UniqueID, UpdatedColumn(s)  from database
> 2) Using UniqueID Retrieve document from *Solr*
> 3) Add/Update field(s) with updated column(s)
> 4) Add document back to *Solr*
> 
> Although, if you use DIH to do a full *import*, using the same query in
> your *Delta*-*Import* to get the whole document shouldn't be that
> difficult."
> 
> and
> 
> "Hi,
> 
> Make sure you use a proper "ID" field, which does *not* change even if
> the
> content in the database changes. In this way, when your
> *delta*-*import* fetches
> changed rows to index, they will update the existing rows in your index.
> "
> 
> I have an ID field that doesn't change.  It is the primary key field
> from
> the database table I am trying to index and I have verified it is
> unique.
> 
> So, does Solr allow updates (not inserts) of existing records?  Is
> anyone
> able to do this?
> 
> Mark



updating existing data in index vs inserting new data in index

2011-07-07 Thread Mark juszczec
Hello all

I'm using Solr 3.2 and am confused about updating existing data in an index.

According to the DataImportHandler Wiki:

*"delta-import* : For incremental imports and change detection run the
command `http://:/solr/dataimport?command=delta-import . It
supports the same clean, commit, optimize and debug parameters as
full-import command."

I know delta-import will find new data in the database and insert it into
the index.  My problem is how it handles updates where I've got a record
that exists in the index and the database, the database record is changed
and I want to incorporate those changes in the existing record in the index.
 IOW I don't want to insert it again.

I've tried this and wound up with 2 records with the same key in the index.
 The first contains the original db values found when the index was created,
the 2nd contains the db values after the record was changed.

I've also found this
http://search.lucidimagination.com/search/out?u=http%3A%2F%2Flucene.472066.n3.nabble.com%2FDelta-import-with-solrj-client-tp1085763p1086173.html
the
subject is 'Delta-import with solrj client'

"Greetings. I have a *solrj* client for fetching data from database. I am
using *delta*-*import* for fetching data. If a column is changed in database
using timestamp with *delta*-*import* i get the latest column indexed but
there are *duplicate* values in the index similar to the column but the data
is older. This works with cleaning the index but i want to update the index
without cleaning it. Is there a way to just update the index with the
updated column without having *duplicate* values. Appreciate for any
feedback.

Hando"

There are 2 responses:

"Short answer is no, there isn't a way. *Solr* doesn't have the concept of
'Update' to an indexed document. You need to add the full document (all
'columns') each time any one field changes. If doing that in your
DataImportHandler logic is difficult you may need to write a separate Update
Service that does:

1) Read UniqueID, UpdatedColumn(s)  from database
2) Using UniqueID Retrieve document from *Solr*
3) Add/Update field(s) with updated column(s)
4) Add document back to *Solr*

Although, if you use DIH to do a full *import*, using the same query in
your *Delta*-*Import* to get the whole document shouldn't be that
difficult."

and

"Hi,

Make sure you use a proper "ID" field, which does *not* change even if the
content in the database changes. In this way, when your
*delta*-*import* fetches
changed rows to index, they will update the existing rows in your index. "

I have an ID field that doesn't change.  It is the primary key field from
the database table I am trying to index and I have verified it is unique.

So, does Solr allow updates (not inserts) of existing records?  Is anyone
able to do this?

Mark


Re: The correct query syntax for date ?

2011-07-07 Thread Erick Erickson
Well, just search for "date field" in schema.xml (assuming a recent
version of Solr, you haven't told us what version you're using).

The "green" assumes you're using an editor that highlights comments
in an XML file.

But all the information you need is right there in Ahmet's e-mail. Dates
are represented as 1995-12-31T23:59:59Z. Look at
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

(first hit when googling "solr date").

Best
Erick

On Thu, Jul 7, 2011 at 7:33 AM, duddy67  wrote:
> Thanks but I'm still lost.
> I didn't see any green colored comments.
> Could you show me a concrete example of a date query ?
>
> Thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/The-correct-query-syntax-for-date-tp3147536p3147890.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: the version of a Lucene index changes after an optimize?

2011-07-07 Thread Erick Erickson
If a document is deleted, the terms are left in the index, the
document is just *marked* as deleted. So anything that
traverses the terms will pick up terms from old (deleted)
documents.

An optimize will remove the "stale" data, so I should think
that your component will have to be refreshed on an optimize!

I'm pretty sure that the version number is changed after an
optimize, as the segments files may be completely re-written
and the index is, indeed, different. It should be a pretty easy
thing to verify...

Best
Erick

On Thu, Jul 7, 2011 at 4:50 AM, gquaire  wrote:
> Thanks Eric for your reply,.
>
> To answer to your question, I'm currently developing a kind of
> TermsComponent which is able to merge the terms of several fields and have
> the ability to reach a position in the list with a random access . To do
> that, I construct a merged list of terms from the Lucene Index for these
> fields. I need to rebuild this list each time the index has been modified.
> If an optimize changes the Lucene index data, so I have to detect it as I do
> for classical upates. Can I use the version number of the index to detect
> such modifications?
>
> Best regards,
>
> gquaire
>
> 2011/7/6 Erick Erickson [via Lucene] <
> ml-node+3145453-901143774-377...@n3.nabble.com>
>
>> I question this point:
>> "But, if only an optimize has happened (the data in the index
>> didn't change), the component doesn't need to be modified"
>>
>> An optimize may, for instance, change the internal Lucene
>> document IDs. What is your component doing?
>>
>> Also, optimize should be a fairly rare occurrence. I'm wondering
>> if it's worth the hassle to detect it.
>>
>> Best
>> Erick
>>
>> On Wed, Jul 6, 2011 at 3:37 AM, gquaire <[hidden 
>> email]>
>> wrote:
>>
>> > Hello everybody,
>> >
>> > I am new in this forum and I need your expertize on Solr-Lucene.
>> > I'm currently develop a new component for Solr for a professional
>> project.
>> > This component  has to be refreshed when some mofications  have been
>> applied
>> > in the index. But, if only an optimize has happened (the data in the
>> index
>> > didn't change), the component doesn't need to be modified. To do that,
>> I'm
>> > testing the version number stored in the index which can be retrieved by
>> the
>> > IndexReader class with "IndexReader.getCurrentVersion()". But, I need to
>> > know if the version number is incremented after an optimize operation.
>> Can
>> > you tell me if it is the case?
>> > If it is, how can I detect that the data have changed in the index ?
>> >
>> > Thanks for your help!
>> >
>> > gquaire
>> >
>> >
>> >
>> > -
>> > Jouve ITS France
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/the-version-of-a-Lucene-index-changes-after-an-optimize-tp3143822p3143822.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/the-version-of-a-Lucene-index-changes-after-an-optimize-tp3143822p3145453.html
>>  To unsubscribe from the version of a Lucene index changes after an
>> optimize?, click 
>> here.
>>
>>
>
>
> -
> Jouve ITS France
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/the-version-of-a-Lucene-index-changes-after-an-optimize-tp3143822p3147556.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: ClassCastException launching recent snapshot

2011-07-07 Thread Benson Margulies
I just checked the classpath. No strays.

You can't 'use the example solr installation' when downloading the war
artifact from the snapshot repo.

You might ask me: what happens if I just drop that war into a plain
tomcat? And I plan to try that.

On Thu, Jul 7, 2011 at 8:29 AM, Erick Erickson  wrote:
> Then I would guess that you have other (older) jars in your classpath
> somewhere. Does the example Solr installation work?
>
> Best
> Erick
>
> On Wed, Jul 6, 2011 at 10:21 PM, Benson Margulies  
> wrote:
>> Launching solr-4.0-20110705.223601-1.war, I get a class cast exception
>>
>> org.apache.lucene.index.DirectoryReader cannot be cast to
>> org.apache.solr.search.SolrIndexReader with the following backtrace.
>>
>> I'm launching solr-as-a-webapp via an embedded copy of tomcat 7. The
>> location of the index is set up via:
>>
>> System.setProperty("solr.data.dir", solrDataDirectory);
>>
>> Further, the sources in the corresponding -sources .jar doesn't seem
>> to have a cast to SolrIndexReader in it anywhere in SolrIndexSearcher.
>>
>> SolrIndexSearcher.(SolrCore, IndexSchema, String, IndexReader,
>> boolean, boolean) line: 142
>> SolrCore.getSearcher(boolean, boolean, Future[]) line: 1085
>> SolrCore.(String, String, SolrConfig, IndexSchema,
>> CoreDescriptor) line: 587
>> CoreContainer.create(CoreDescriptor) line: 660
>> CoreContainer.load(String, InputStream) line: 412
>> CoreContainer$Initializer.initialize() line: 246
>> SolrDispatchFilter.init(FilterConfig) line: 86
>> ApplicationFilterConfig.initFilter() line: 273
>> ApplicationFilterConfig.getFilter() line: 254
>> ApplicationFilterConfig.setFilterDef(FilterDef) line: 372
>> ApplicationFilterConfig.(Context, FilterDef) line: 98
>> StandardContext.filterStart() line: 4584
>> StandardContext$2.call() line: 5262
>> StandardContext$2.call() line: 5257
>> FutureTask$Sync.innerRun() line: 303
>> FutureTask.run() line: 138
>> ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
>>
>


Re: ClassCastException launching recent snapshot

2011-07-07 Thread Erick Erickson
Then I would guess that you have other (older) jars in your classpath
somewhere. Does the example Solr installation work?

Best
Erick

On Wed, Jul 6, 2011 at 10:21 PM, Benson Margulies  wrote:
> Launching solr-4.0-20110705.223601-1.war, I get a class cast exception
>
> org.apache.lucene.index.DirectoryReader cannot be cast to
> org.apache.solr.search.SolrIndexReader with the following backtrace.
>
> I'm launching solr-as-a-webapp via an embedded copy of tomcat 7. The
> location of the index is set up via:
>
> System.setProperty("solr.data.dir", solrDataDirectory);
>
> Further, the sources in the corresponding -sources .jar doesn't seem
> to have a cast to SolrIndexReader in it anywhere in SolrIndexSearcher.
>
> SolrIndexSearcher.(SolrCore, IndexSchema, String, IndexReader,
> boolean, boolean) line: 142
> SolrCore.getSearcher(boolean, boolean, Future[]) line: 1085
> SolrCore.(String, String, SolrConfig, IndexSchema,
> CoreDescriptor) line: 587
> CoreContainer.create(CoreDescriptor) line: 660
> CoreContainer.load(String, InputStream) line: 412
> CoreContainer$Initializer.initialize() line: 246
> SolrDispatchFilter.init(FilterConfig) line: 86
> ApplicationFilterConfig.initFilter() line: 273
> ApplicationFilterConfig.getFilter() line: 254
> ApplicationFilterConfig.setFilterDef(FilterDef) line: 372
> ApplicationFilterConfig.(Context, FilterDef) line: 98
> StandardContext.filterStart() line: 4584
> StandardContext$2.call() line: 5262
> StandardContext$2.call() line: 5257
> FutureTask$Sync.innerRun() line: 303
> FutureTask.run() line: 138
> ThreadPoolExecutor$Worker.runTask(Runnable) line: 886
>


Re: The correct query syntax for date ?

2011-07-07 Thread duddy67
Thanks but I'm still lost.
I didn't see any green colored comments.
Could you show me a concrete example of a date query ?

Thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-correct-query-syntax-for-date-tp3147536p3147890.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: The correct query syntax for date ?

2011-07-07 Thread Ahmet Arslan
> I have a syntax problem in my query with the SOLR date
> format.
> This is what I type:
> 
> q=datecreation:2001-10-11
> 
> but SOLR returns me an error message:
> 
> Invalid Date String:'2001-10-11'
> 
> I tried different combinations but none of them works.
> Someone could tells me what is the correct syntax ?

Please see the green colored comments in example schema.xml:

"The format for this date field is of the form 1995-12-31T23:59:59Z, and
 is a more restricted form of the canonical representation of dateTime
 http://www.w3.org/TR/xmlschema-2/#dateTime
 The trailing "Z" designates UTC time and is mandatory."


(Solr-UIMA) Doubt regarding integrating UIMA in to solr - Configuration.

2011-07-07 Thread Sowmya V.B.
Hi

I am trying to add UIMA module in to Solr..and began with the readme file
given here.
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/README.txt

I am confused about some points in the readme file and hence the email.

2. modify your schema.xml adding the fields you want to be hold
metadata specifying proper values for type, indexed, stored and
multiValued options:

-I understood this line as: adding to my schema.xml, the new fields that
will come as a result of a UIMA pipeline. For example, in my UIMA pipeline,
post-processing, I get fields A,B,C in addition to fields X,Y,Z that I
already added to the SolrInputDocument. So, does this mean I should add
A,B,C to the schema.xml?

3. In SolrConfig.xml,

inside,




if iam not using any of those "alchemy api key..." etc, I think I can remove
those lines. However, I plan to use the openNLP tagger & tokenizer, and an
annotator I wrote for my task. Can I give my model file locations here as
runtimeParameters?

4. I did not understand what "fieldMapping" tag does. The description said:
"field mapping describes which features of which types should go in a
field"--
- For example, in this snippet from the link:

 
   
  

-what does "feature" mean and what does "field" mean?


I did not understand the fieldmapping tag right and did not find any help in
previous mails. Hence, mailing the group. Sorry for the long mail!

Regards
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: the version of a Lucene index changes after an optimize?

2011-07-07 Thread gquaire
Thanks Eric for your reply,.

To answer to your question, I'm currently developing a kind of
TermsComponent which is able to merge the terms of several fields and have
the ability to reach a position in the list with a random access . To do
that, I construct a merged list of terms from the Lucene Index for these
fields. I need to rebuild this list each time the index has been modified.
If an optimize changes the Lucene index data, so I have to detect it as I do
for classical upates. Can I use the version number of the index to detect
such modifications?

Best regards,

gquaire

2011/7/6 Erick Erickson [via Lucene] <
ml-node+3145453-901143774-377...@n3.nabble.com>

> I question this point:
> "But, if only an optimize has happened (the data in the index
> didn't change), the component doesn't need to be modified"
>
> An optimize may, for instance, change the internal Lucene
> document IDs. What is your component doing?
>
> Also, optimize should be a fairly rare occurrence. I'm wondering
> if it's worth the hassle to detect it.
>
> Best
> Erick
>
> On Wed, Jul 6, 2011 at 3:37 AM, gquaire <[hidden 
> email]>
> wrote:
>
> > Hello everybody,
> >
> > I am new in this forum and I need your expertize on Solr-Lucene.
> > I'm currently develop a new component for Solr for a professional
> project.
> > This component  has to be refreshed when some mofications  have been
> applied
> > in the index. But, if only an optimize has happened (the data in the
> index
> > didn't change), the component doesn't need to be modified. To do that,
> I'm
> > testing the version number stored in the index which can be retrieved by
> the
> > IndexReader class with "IndexReader.getCurrentVersion()". But, I need to
> > know if the version number is incremented after an optimize operation.
> Can
> > you tell me if it is the case?
> > If it is, how can I detect that the data have changed in the index ?
> >
> > Thanks for your help!
> >
> > gquaire
> >
> >
> >
> > -
> > Jouve ITS France
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/the-version-of-a-Lucene-index-changes-after-an-optimize-tp3143822p3143822.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/the-version-of-a-Lucene-index-changes-after-an-optimize-tp3143822p3145453.html
>  To unsubscribe from the version of a Lucene index changes after an
> optimize?, click 
> here.
>
>


-
Jouve ITS France
--
View this message in context: 
http://lucene.472066.n3.nabble.com/the-version-of-a-Lucene-index-changes-after-an-optimize-tp3143822p3147556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing but not able to search

2011-07-07 Thread Sowmya V.B.
Thanks Ahmet.

Changing the field from String to "text_en" worked!

Sorry for all the mails. I should have understood the schema.xml properly
before asking the question. Now, I see that schema.xml has description of
this field "text_en" !

Sowmya.

On Thu, Jul 7, 2011 at 10:24 AM, Ahmet Arslan  wrote:

> > Thanks for the mail.
> > But, just a clarification: changing the field type in
> > schema means I have to
> > reindex to check if this works, right?
>
> Yes.  restart servlet container and re-index is required.
>



-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: Problem with spellchecking, dont want multiple request to SOLR

2011-07-07 Thread roySolr
What should the query look like??

I can't define 2 spellchecker in one query. I want something like this:

Search: Soccerclub(what) Manchester(where)

select/?q=socerclub
macnchester&spellcheck=true&spellcheck.dictionary=spell_what&spellcheck.dictionary=spell_where&spell_what=socerclub&spell_where=macnchester

Now i have 2 spellcheckers in my requesthandler but i can't set them correct
in my query.
My config looks like this:


spellcheck1
spellcheck2



 
spell_what
spell_search1
true
spellchecker1



 
spell_where
spell_search2
true
spellchecker2






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-spellchecking-dont-want-multiple-request-to-SOLR-tp2988167p3147545.html
Sent from the Solr - User mailing list archive at Nabble.com.


The correct query syntax for date ?

2011-07-07 Thread duddy67
Hi,

I have a syntax problem in my query with the SOLR date format.
This is what I type:

q=datecreation:2001-10-11

but SOLR returns me an error message:

Invalid Date String:'2001-10-11'

I tried different combinations but none of them works.
Someone could tells me what is the correct syntax ?


Thanks for advance.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-correct-query-syntax-for-date-tp3147532p3147532.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How many fields can SOLR handle?

2011-07-07 Thread roySolr
Hello Erik,

I need the *_facets also for searching so stored must be true.

"Then, and I used *_facet similar to you, kept a list of all *_facet actual
field names and used those in all subsequent search requests. "

Is this not bad for performance? I only need a few facets, not all.(only the
facets for the chosen category)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp3033910p3147520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: indexing but not able to search

2011-07-07 Thread Ahmet Arslan
> Thanks for the mail.
> But, just a clarification: changing the field type in
> schema means I have to
> reindex to check if this works, right?

Yes.  restart servlet container and re-index is required.


Re: indexing but not able to search

2011-07-07 Thread Sowmya V.B.
Hello Ahmet

Thanks for the mail.
But, just a clarification: changing the field type in schema means I have to
reindex to check if this works, right?

Sowmya

On Thu, Jul 7, 2011 at 10:13 AM, Ahmet Arslan  wrote:

> Hello,
>
> Your text and title fields are marked as string which is not tokenized.
>
> 
>
> marking them indexed="true" will make them searchable but they will be
> indeed verbatim.
>
> Try using text_en for example.
>
> 
>
>
> --- On Thu, 7/7/11, Sowmya V.B.  wrote:
>
> From: Sowmya V.B. 
> Subject: Re: indexing but not able to search
> To: solr-user@lucene.apache.org
> Date: Thursday, July 7, 2011, 10:39 AM
>
> Hi Eric
>
> Sorry for the messup.
> I was talking about the default search field in schema.xml itself. I
> changed it to "title" instead of "text" thinking that I might get something
> seeing that.
>
>
>
> Attaching schema.xml and solrconfig.xml with this mail.
> I just added my fields to the example schema.xml file given with the
> distribution.
>
> Also attaching the debug output for the query "time", in an xml file
> (timedebugquery.xml).
>
>
> Here is the link:
> http://localhost:8080/apache-solr-3.3.0/select/?q=time&version=2.2&start=0&rows=10&indent=on&debugQuery=on
>
>
>
> I guess, this time, I am consistent enough with the data i provided in this
> mail!
> Sorry about the messup, again!
>
> Sowmya.
>
>
> On Wed, Jul 6, 2011 at 8:57 PM, Erick Erickson 
> wrote:
>
>
> You're giving contradictory information here. This is NOT the query that
>
> you submitted when you did the &debugQuery=on that you sent before.
>
>
>
> Look in schema.xml for  I bet its value is "title". I'm
> not
>
> talking at all about the fact that the schema has title and text
>
> fields, I'm looking
>
> at the debug output and inferring the  because:
>
> query
>
>  query
>
>  title:query
>
>  title:query
>
>
>
> the rawquery string is "query". Meaning you typed something like
>
> q=query.
>
> the parsedquery is "title:query" implying that your 
>
> is "title".
>
>
>
> That means that the URL you gave is searching against the "title" field for
>
> "head". Do you really expect that to match?
>
>
>
> Please make the effort to provide a consistent set of data. Don't
>
> give fragmentary pieces from different queries. Your debug data
>
> cannot be from a schema that has "text" as the default field. It just
>
> doesn't work that way.
>
>
>
> So, I suspect you aren't going against the solr instance you think. Or
>
> you're looking at configuration data that isn't being used by that solr.
>
> Or you're cutting/pasting/copying different fragments. And you still
>
> haven't shown us the schema.xml file.
>
>
>
> So, give us the debug output, and show us the exact query you use
>
> to get that output.
>
>
>
> Best
>
> Erick
>
>
>
> On Wed, Jul 6, 2011 at 1:20 PM, Sowmya V.B.  wrote:
>
> > Hi Eric
>
> >
>
> > Yes, the schema.xml has both title and text fields... and i was changing
>
> > between them...perhaps, it was "title" when I sent you.
>
> >
>
> > I am searching from the admin. this is the URL it gives me, after I click
>
> > search, from the admin window.
>
> >
>
> >
> http://loalhost:8080/apache-solr-3.3.0/select/?q=head&version=2.2&start=0&rows=10&indent=on
>
>
>
> >
>
> > S
>
> >
>
> > On Wed, Jul 6, 2011 at 7:12 PM, Erick Erickson  >wrote:
>
> >
>
> >> About being new... no problem, we all have to learn
>
> >>
>
> >> But this part of your output:
>
> >> query
>
> >>  query
>
> >>  title:query
>
> >>  title:query
>
> >>
>
> >> indicates that something's weird with your query. Can you show the
>
> >> exact URL you use? If you're searching form the admin interface, that
>
> >> will be the URL in the results window. Because this is indicating a
> couple
>
> >> of things:
>
> >> 1> your query is of the form "?q=query" or some such.
>
> >> 2> your default search field is "title" (see schema.xml)..
>
> >>
>
> >> Best
>
> >> Erick
>
> >>
>
> >> On Wed, Jul 6, 2011 at 12:47 PM, Sowmya V.B. 
> wrote:
>
> >> > Hi Eric
>
> >> >
>
> >> > 1)Default field in schema.xml : "text", which is the field
>
> >> > 2) numDocs = 21501
>
> >> > maxDocs = 21554
>
> >> > 3) Attaching debugQuery output with this mail
>
> >> > 4) When I search for everything, (*:*)...it shows me all the
> documents,
>
> >> with
>
> >> > their fields.
>
> >> >
>
> >> > I am new to asking questions on the list..and hence the "lack of
>
> >> etiquette".
>
> >> > Thanks for the link. :)
>
> >> >
>
> >> > Sowmya.
>
> >> >
>
> >> > On Wed, Jul 6, 2011 at 6:32 PM, Erick Erickson <
> erickerick...@gmail.com>
>
> >> > wrote:
>
> >> >>
>
> >> >> OK, there's not much information to go on here. So..
>
> >> >>
>
> >> >> 1> you pasted solrconfig.xml. Schema.xml contains your default field,
>
> >> >> we need to see that too.
>
> >> >> 2> you say documents are shown in the stats page. There are two
>
> >> >> numbers, numDocs and maxDocs.
>
> >> >>numDocs is the number of documents that have NOT been deleted,
>
> >> >> what is that number?
>
> >> >> 3> what resu

Re: Problem with first letter accented

2011-07-07 Thread Ahmet Arslan
> I'm using Solr 3.3 for searching in different languages,
> one of them is Spanish. The ASCIIFoldingFilterFactory works
> fine, but if word begins with a letter accented, like
> "ágora" or "ínclito", it can't find anything. I have to
> search word without accent in order to find some result. For
> instance:
> 
>  
> 
> -          Title: Imágenes del
> ágora de la plaza central.
> 
> -          Searching text:
> "imágenes" or "imagenes" returns the same result, the title
> above
> 
> -          Searching text:
> "ágora" returns no results, while "agora" returns the right
> result

That's quite strange. Your field type definition would be needed. 

and admin/analysis.jsp show step by step output of analysis.
What happens to words  "ágora" or "ínclito" at index time and query time?


Re: indexing but not able to search

2011-07-07 Thread Ahmet Arslan
Hello,

Your text and title fields are marked as string which is not tokenized.



marking them indexed="true" will make them searchable but they will be indeed 
verbatim.

Try using text_en for example. 




--- On Thu, 7/7/11, Sowmya V.B.  wrote:

From: Sowmya V.B. 
Subject: Re: indexing but not able to search
To: solr-user@lucene.apache.org
Date: Thursday, July 7, 2011, 10:39 AM

Hi Eric

Sorry for the messup.
I was talking about the default search field in schema.xml itself. I changed it 
to "title" instead of "text" thinking that I might get something seeing that.



Attaching schema.xml and solrconfig.xml with this mail.
I just added my fields to the example schema.xml file given with the 
distribution.

Also attaching the debug output for the query "time", in an xml file 
(timedebugquery.xml).


Here is the link: 
http://localhost:8080/apache-solr-3.3.0/select/?q=time&version=2.2&start=0&rows=10&indent=on&debugQuery=on



I guess, this time, I am consistent enough with the data i provided in this 
mail!
Sorry about the messup, again!

Sowmya.


On Wed, Jul 6, 2011 at 8:57 PM, Erick Erickson  wrote:


You're giving contradictory information here. This is NOT the query that

you submitted when you did the &debugQuery=on that you sent before.



Look in schema.xml for  I bet its value is "title". I'm not

talking at all about the fact that the schema has title and text

fields, I'm looking

at the debug output and inferring the  because:

query

 query

 title:query

 title:query



the rawquery string is "query". Meaning you typed something like

q=query.

the parsedquery is "title:query" implying that your 

is "title".



That means that the URL you gave is searching against the "title" field for

"head". Do you really expect that to match?



Please make the effort to provide a consistent set of data. Don't

give fragmentary pieces from different queries. Your debug data

cannot be from a schema that has "text" as the default field. It just

doesn't work that way.



So, I suspect you aren't going against the solr instance you think. Or

you're looking at configuration data that isn't being used by that solr.

Or you're cutting/pasting/copying different fragments. And you still

haven't shown us the schema.xml file.



So, give us the debug output, and show us the exact query you use

to get that output.



Best

Erick



On Wed, Jul 6, 2011 at 1:20 PM, Sowmya V.B.  wrote:

> Hi Eric

>

> Yes, the schema.xml has both title and text fields... and i was changing

> between them...perhaps, it was "title" when I sent you.

>

> I am searching from the admin. this is the URL it gives me, after I click

> search, from the admin window.

>

> http://loalhost:8080/apache-solr-3.3.0/select/?q=head&version=2.2&start=0&rows=10&indent=on



>

> S

>

> On Wed, Jul 6, 2011 at 7:12 PM, Erick Erickson wrote:

>

>> About being new... no problem, we all have to learn

>>

>> But this part of your output:

>> query

>>  query

>>  title:query

>>  title:query

>>

>> indicates that something's weird with your query. Can you show the

>> exact URL you use? If you're searching form the admin interface, that

>> will be the URL in the results window. Because this is indicating a couple

>> of things:

>> 1> your query is of the form "?q=query" or some such.

>> 2> your default search field is "title" (see schema.xml)..

>>

>> Best

>> Erick

>>

>> On Wed, Jul 6, 2011 at 12:47 PM, Sowmya V.B.  wrote:

>> > Hi Eric

>> >

>> > 1)Default field in schema.xml : "text", which is the field

>> > 2) numDocs = 21501

>> >     maxDocs = 21554

>> > 3) Attaching debugQuery output with this mail

>> > 4) When I search for everything, (*:*)...it shows me all the documents,

>> with

>> > their fields.

>> >

>> > I am new to asking questions on the list..and hence the "lack of

>> etiquette".

>> > Thanks for the link. :)

>> >

>> > Sowmya.

>> >

>> > On Wed, Jul 6, 2011 at 6:32 PM, Erick Erickson 

>> > wrote:

>> >>

>> >> OK, there's not much information to go on here. So..

>> >>

>> >> 1> you pasted solrconfig.xml. Schema.xml contains your default field,

>> >> we need to see that too.

>> >> 2> you say documents are shown in the stats page. There are two

>> >> numbers, numDocs and maxDocs.

>> >>    numDocs is the number of documents that have NOT been deleted,

>> >> what is that number?

>> >> 3> what results from attaching &debugQuery=on to your URL?

>> >> 4> what shows up in the admin page when you search for everything?

>> >>

>> >> It would help a lot if you'd provide some more detailed information,

>> >> please review: http://wiki.apache.org/solr/UsingMailingLists,

>> >>

>> >> Best

>> >> Erick

>> >>

>> >> On Wed, Jul 6, 2011 at 12:10 PM, Sowmya V.B. 

>> wrote:

>> >> > I am sorry..I was checking the some other solr instance that ran on

>> this

>> >> > system...when I replied for the previous mail.

>> >> >

>> >> > I still dont get any documents in return to my query...though the

>> index

>> >>

Re: Re:OOM at solr master node while updating document

2011-07-07 Thread pravesh
You just need to allocate more heap to your JVM.
BTW are you doing any complex search while indexing is in progress, like
getting large set of documents.

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/OOM-at-solr-master-node-while-updating-document-tp3140018p3147475.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with first letter accented

2011-07-07 Thread Villacorta Peral, Eva
Hi

 

I'm using Solr 3.3 for searching in different languages, one of them is 
Spanish. The ASCIIFoldingFilterFactory works fine, but if word begins with a 
letter accented, like "ágora" or "ínclito", it can't find anything. I have to 
search word without accent in order to find some result. For instance:

 

-  Title: Imágenes del ágora de la plaza central.

-  Searching text: "imágenes" or "imagenes" returns the same result, 
the title above

-  Searching text: "ágora" returns no results, while "agora" returns 
the right result

 

Thx in advance

Eva