Re: catchall field minus one field

2012-01-12 Thread elisabeth benoit
thanks a lot for your advice, I'll try that.

Best regards,
Elisabeth

2012/1/11 Erick Erickson erickerick...@gmail.com

 Hmmm, Once the data is included in the catch-all, it's indistinguishable
 from
 all the rest of the data, so I don't see how you could do this. A clause
 like:
 -excludeField:[* TO *] would exclude all documents that had any data in
 the field so that's probably not what you want.

 Could you approach it the other way? Do NOT put the special field in
 the catch-all field in the first place, but massage the input to add
 a clause there? I.e. your usual case would have
 catchall:all your terms exclude_field:all your terms, but your
 special one would just be catchall:all your terms.

 You could set up request handlers to do this under the covers, so your
 queries would really be
 ...solr/usual?q=all your terms
 ...solr/special?q=all your terms
 and two different request handlers (edismax-style I'm thinking)
 would differ only by the qf field containing or not containing
 your special field.

 the other way, of course, would be to have a second catch-all
 field that didn't have your special field, then use one or the other
 depending, but as you say that would increase the size of your
 index...

 Best
 Erick

 On Wed, Jan 11, 2012 at 9:47 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Hello,
 
  I have a catchall field, and I need to do some request in all fields of
  that catchall field, minus one. To avoid duplicating my index, I'd like
 to
  know if there is a way to use my catch field while excluding that one
 field.
 
  Thanks,
  Elisabeth



Restricting access to shards / collections with SolrCloud

2012-01-12 Thread Jaran Nilsen
Hi.

We're currently looking at SolrCloud to improve management of our Solr
cluster. There is one use case which I am wondering if SolrCloud provide
any support for out of the box, or if our best bet is to stick with our
current solution.

The use case is:

We have a large number of shards, using the same schema - so, perfect for
SolrCloud. Some of these shards should have restricted access, meaning only
customers with certain privileges will be able to query them. The way we
solve this today is to maintain a database listing those users who have
access to these restricted shards. When building the shards-parameter for
querying Solr, we then use this database to append the URLs of the
restricted shards ONLY if the user has access to them.

With SolrCloud it would be great to be able to use the distrib=true
parameter, but that would override the approach we're currently using.

My questions are:

1. would it be an idea to create a separate collection for the shards that
are restricted? If so, is there currently any support for specifying which
collections to search so that we could implement the solution outlined
above, but for collections rather than shards?

2. If no-go on #1, are we better off sticking with our current approach and
skip using distrib=true which would query all shards?

Any input appreciated!

Best,
Jaran

-- 
Jaran Nilsen
Skype: jaran.nilsen
jarannilsen.com || codemunchies.com || notpod.com
twitter.com/jarannilsen // www.linkedin.com/in/jarannilsen //
facebook.com/jaran.nilsen


Re: Large data set or data corpus

2012-01-12 Thread jmuguruza
http://www.data.gov/ has lots of datasets available for free

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Large-data-set-or-data-corpus-tp3650316p3653154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Not able to see output in XML output

2012-01-12 Thread rajalapati
Hi,

In my SOLR, I have a query based data-config written and was able to manage
below steps but i was not able to see the output



1) Register Data Import Request handler in Solr-config.xml
2) Modify Data-Config.xml for the appropriate query to get data imported
from which includes making use of Jtds Driver for Sql server
3) Modify SolrConfig.xml file for registering db-data-config.xml in Request
Handler item
4) Modify schema.xml for the output result. Right now we are facing issues
here.please let me attach 2 files 1) schema.xml 2) db-data-config.xml.

Schema.xml

?xml version=1.0 encoding=UTF-8 ?
  schema name=example version=1.2
  types
  fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true /
  fieldType name=int class=solr.TrieIntField precisionStep=0
omitNorms=true positionIncrementGap=0 /
  fieldType name=date class=solr.TrieDateField omitNorms=true
precisionStep=0 positionIncrementGap=0 /
  fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory
language=English protected=protwords.txt /
  /analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory
language=English protected=protwords.txt /
/analyzer
  /fieldType
  /types
  fields
  field name=FileId type=string indexed=true stored=true
required=true /
  field name=Title type=string indexed=true stored=true
required=true /
 
  /fields
  
   uniqueKeyFileId/uniqueKey
   defaultSearchFieldFileId/defaultSearchField
   solrQueryParser defaultOperator=AND /
  /schema


db-data-config.xml

dataConfig
dataSource type=JdbcDataSource
driver=net.sourceforge.jtds.jdbc.Driver
url=jdbc:jtds:sqlserver://17.30.199.667:1433;databaseName= user=
password=XXX /
document
entity name=Files query=Select FileID,Title from files
   
field column=FileID name=FileID /
field column=Title name=Title /
   
/entity
   
   
   
/document
/dataConfig





5) Make full import http request for data to get indexed into solr server.
Even though i see that all the rows are indexed but not able to find results
when search is clicked on the admin page

6) Am i missing any step to configure the output,I have changed
db-data-config,schema.xml and solrconfig.xml files ,Do i need to change any
other files for the output


Thanks
Raj Deep 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-able-to-see-output-in-XML-output-tp3653445p3653445.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Relevancy and random sorting

2012-01-12 Thread Alexandre Rocco
Erick,

This document already has a field that indicates the source (site).
The issue we are trying to solve is when we list all documents without any
specific criteria. Since we bring the most recent ones and the ones that
contains images, we end up having a lot of listings from a single site,
since the documents are indexed in batches from the same site. At some
point we have several documents from the same site in the same date/time
and having images. I'm trying to give some random aspect to this search so
other documents can also appear in between that big dataset from the same
source.
Does the grouping help to achieve this?

Alexandre

On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson erickerick...@gmail.comwrote:

 Alexandre:

 Have you thought about grouping? If you can analyze the incoming
 documents and include a field such that similar documents map
 to the same value, than group on that value you'll get output that
 isn't dominated by repeated copies of the similar documents. It
 depends, though, on being able to do a suitable mapping.

 In your case, could the mapping just be the site from which you
 got the data?

 Best
 Erick

 On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco alel...@gmail.com
 wrote:
  Erick,
 
  Probably I really written something silly. You are right on either
 sorting
  by field or ranking.
  I just need to change the ranking to shift things around as you said.
 
  To clarify the use case:
  We have a listing aggregator that gets product listings from a lot of
  different sites and since they are added in batches, sometimes you see a
  lot of pages from the same source (site). We are working on some changes
 to
  shift things around and reduce this blocking effect, so we can present
  mixed sources on the result pages.
 
  I guess I will start with the document random field and later try to
  develop a custom plugin to make things better.
 
  Thanks for the pointers.
 
  Regards,
  Alexandre
 
  On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  I really don't understand what this means:
  random sorting for the records but also preserving the ranking
 
  Either you're sorting on rank or you're not. If you mean you're
  trying to shift things around just a little bit, *mostly* respecting
  relevance then I guess you can do what you're thinking.
 
  You could create your own function query to do the boosting, see:
  http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
 
  which would keep you from having to re-index your data to get
  a different randomness.
 
  You could also consider external file fields, but I think your
  own function query would be cleaner. I don't think math.random
  is a supported function OOB
 
  Best
  Erick
 
 
  On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco alel...@gmail.com
  wrote:
   Hello all,
  
   Recently i've been trying to tweak some aspects of relevancy in one
  listing
   project.
   I need to give a higher score to newer documents and also boost the
   document based on a boolean field that indicates the listing has
  pictures.
   On top of that, in some situations we need a random sorting for the
  records
   but also preserving the ranking.
  
   I tried to combine some techniques described in the Solr Relevancy FAQ
   wiki, but when I add the random sorting, the ranking gets messy (as
   expected).
  
   This works well:
  
 
 http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,score
  
   This does not work, gives a random order on what is already ranked
  
 
 http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,scoresort=random_1+desc
  
   The only way I see is to create another field on the schema
 containing a
   random value and use it to boost the document the same way that was
 tone
  on
   the boolean field.
   Anyone tried something like this before and knows some way to get it
   working?
  
   Thanks,
   Alexandre
 



Re: Relevancy and random sorting

2012-01-12 Thread Michael Kuhlmann

Does the random sort function help you here?

http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html

However, you will get some very old listings then, if it's okay for you.

-Kuli

Am 12.01.2012 14:38, schrieb Alexandre Rocco:

Erick,

This document already has a field that indicates the source (site).
The issue we are trying to solve is when we list all documents without any
specific criteria. Since we bring the most recent ones and the ones that
contains images, we end up having a lot of listings from a single site,
since the documents are indexed in batches from the same site. At some
point we have several documents from the same site in the same date/time
and having images. I'm trying to give some random aspect to this search so
other documents can also appear in between that big dataset from the same
source.
Does the grouping help to achieve this?

Alexandre

On Thu, Jan 12, 2012 at 12:31 AM, Erick Ericksonerickerick...@gmail.comwrote:


Alexandre:

Have you thought about grouping? If you can analyze the incoming
documents and include a field such that similar documents map
to the same value, than group on that value you'll get output that
isn't dominated by repeated copies of the similar documents. It
depends, though, on being able to do a suitable mapping.

In your case, could the mapping just be the site from which you
got the data?

Best
Erick

On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Roccoalel...@gmail.com
wrote:

Erick,

Probably I really written something silly. You are right on either

sorting

by field or ranking.
I just need to change the ranking to shift things around as you said.

To clarify the use case:
We have a listing aggregator that gets product listings from a lot of
different sites and since they are added in batches, sometimes you see a
lot of pages from the same source (site). We are working on some changes

to

shift things around and reduce this blocking effect, so we can present
mixed sources on the result pages.

I guess I will start with the document random field and later try to
develop a custom plugin to make things better.

Thanks for the pointers.

Regards,
Alexandre

On Wed, Jan 11, 2012 at 1:58 PM, Erick Ericksonerickerick...@gmail.com
wrote:


I really don't understand what this means:
random sorting for the records but also preserving the ranking

Either you're sorting on rank or you're not. If you mean you're
trying to shift things around just a little bit, *mostly* respecting
relevance then I guess you can do what you're thinking.

You could create your own function query to do the boosting, see:
http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

which would keep you from having to re-index your data to get
a different randomness.

You could also consider external file fields, but I think your
own function query would be cleaner. I don't think math.random
is a supported function OOB

Best
Erick


On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Roccoalel...@gmail.com
wrote:

Hello all,

Recently i've been trying to tweak some aspects of relevancy in one

listing

project.
I need to give a higher score to newer documents and also boost the
document based on a boolean field that indicates the listing has

pictures.

On top of that, in some situations we need a random sorting for the

records

but also preserving the ranking.

I tried to combine some techniques described in the Solr Relevancy FAQ
wiki, but when I add the random sorting, the ranking gets messy (as
expected).

This works well:




http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,score


This does not work, gives a random order on what is already ranked




http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,scoresort=random_1+desc


The only way I see is to create another field on the schema

containing a

random value and use it to boost the document the same way that was

tone

on

the boolean field.
Anyone tried something like this before and knows some way to get it
working?

Thanks,
Alexandre










Re: Highlighting issue with PlainTextEntityProcessor.

2012-01-12 Thread meghana
Hi Erik.. Thanks for your reply.

And yes data was on index. but i found the problem , the problem was not of
PlainTextEntityProcessor. highlighting was returning in multivalued field
and in non-multivalued field there was less highlight. so i thought problem
may be in PlainTextEntityProcessor. 

But the actual problem was my Search field is very big... i increased
hl.MaxAnalyzedChar length... and it get working. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-issue-with-PlainTextEntityProcessor-tp3650004p3653708.html
Sent from the Solr - User mailing list archive at Nabble.com.


FacetComponent: suppress original query

2012-01-12 Thread Dmitry Kan
Hello list,

I need to split the incoming original facet query into a list of
sub-queries. The logic is done and each sub-query gets added into outgoing
queue with rb.addRequest(), where rb is instance of ResponseBuilder.
In the logs I see that along with the sub-queries the original query gets
submitted too. Is there a way of suppressing the original query?

-- 
Regards,

Dmitry Kan


Re: Relevancy and random sorting

2012-01-12 Thread Alexandre Rocco
Michael,

We are using the random sorting in combination with date and other fields
but I am trying to change this to affect the ranking instead of sorting
directly.
That way we can also use other useful tweaks on the rank itself.

Alexandre

On Thu, Jan 12, 2012 at 11:46 AM, Michael Kuhlmann k...@solarier.de wrote:

 Does the random sort function help you here?

 http://lucene.apache.org/solr/**api/org/apache/solr/schema/**
 RandomSortField.htmlhttp://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html

 However, you will get some very old listings then, if it's okay for you.

 -Kuli

 Am 12.01.2012 14:38, schrieb Alexandre Rocco:

  Erick,

 This document already has a field that indicates the source (site).
 The issue we are trying to solve is when we list all documents without any
 specific criteria. Since we bring the most recent ones and the ones that
 contains images, we end up having a lot of listings from a single site,
 since the documents are indexed in batches from the same site. At some
 point we have several documents from the same site in the same date/time
 and having images. I'm trying to give some random aspect to this search so
 other documents can also appear in between that big dataset from the same
 source.
 Does the grouping help to achieve this?

 Alexandre

 On Thu, Jan 12, 2012 at 12:31 AM, Erick Ericksonerickerickson@gmail.**
 com erickerick...@gmail.comwrote:

  Alexandre:

 Have you thought about grouping? If you can analyze the incoming
 documents and include a field such that similar documents map
 to the same value, than group on that value you'll get output that
 isn't dominated by repeated copies of the similar documents. It
 depends, though, on being able to do a suitable mapping.

 In your case, could the mapping just be the site from which you
 got the data?

 Best
 Erick

 On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Roccoalel...@gmail.com
 wrote:

 Erick,

 Probably I really written something silly. You are right on either

 sorting

 by field or ranking.
 I just need to change the ranking to shift things around as you said.

 To clarify the use case:
 We have a listing aggregator that gets product listings from a lot of
 different sites and since they are added in batches, sometimes you see a
 lot of pages from the same source (site). We are working on some changes

 to

 shift things around and reduce this blocking effect, so we can present
 mixed sources on the result pages.

 I guess I will start with the document random field and later try to
 develop a custom plugin to make things better.

 Thanks for the pointers.

 Regards,
 Alexandre

 On Wed, Jan 11, 2012 at 1:58 PM, Erick Ericksonerickerickson@gmail.**
 com erickerick...@gmail.com
 wrote:

  I really don't understand what this means:
 random sorting for the records but also preserving the ranking

 Either you're sorting on rank or you're not. If you mean you're
 trying to shift things around just a little bit, *mostly* respecting
 relevance then I guess you can do what you're thinking.

 You could create your own function query to do the boosting, see:
 http://wiki.apache.org/solr/**SolrPlugins#ValueSourceParserhttp://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

 which would keep you from having to re-index your data to get
 a different randomness.

 You could also consider external file fields, but I think your
 own function query would be cleaner. I don't think math.random
 is a supported function OOB

 Best
 Erick


 On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Roccoalel...@gmail.com
 wrote:

 Hello all,

 Recently i've been trying to tweak some aspects of relevancy in one

 listing

 project.
 I need to give a higher score to newer documents and also boost the
 document based on a boolean field that indicates the listing has

 pictures.

 On top of that, in some situations we need a random sorting for the

 records

 but also preserving the ranking.

 I tried to combine some techniques described in the Solr Relevancy FAQ
 wiki, but when I add the random sorting, the ranking gets messy (as
 expected).

 This works well:


  http://localhost:18979/solr/**select/?start=0rows=15q={!**
 boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
 active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
 haspicture%22fl=*,score


 This does not work, gives a random order on what is already ranked


  http://localhost:18979/solr/**select/?start=0rows=15q={!**
 boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
 active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
 haspicture%22fl=*,scoresort=**random_1+desc


 The only way I see is to create another field on the schema

 containing a

 random value and use it to boost the document the same way that was

 tone

 on

 the boolean field.
 Anyone tried something like this before and knows some way to get it
 working?

 Thanks,
 Alexandre








Re: Relevancy and random sorting

2012-01-12 Thread Ahmet Arslan
 This document already has a field that indicates the source
 (site).
 The issue we are trying to solve is when we list all
 documents without any
 specific criteria. Since we bring the most recent ones and
 the ones that
 contains images, we end up having a lot of listings from a
 single site,
 since the documents are indexed in batches from the same
 site. At some
 point we have several documents from the same site in the
 same date/time
 and having images. I'm trying to give some random aspect to
 this search so
 other documents can also appear in between that big dataset
 from the same
 source.
 Does the grouping help to achieve this?

Yes, http://wiki.apache.org/solr/FieldCollapsing
You will display only 3 documents at most from a single site. You will put a 
link saying that, there are xxx more documents from site yyy, click here to see 
all of them.


Re: Search Specific Boosting

2012-01-12 Thread Brett

Hi Erick,

Yeah, I've reviewed the debug output and can't make sense of why they 
are scoring the same.  I have double checked that they are being indexed 
with different boost values for the search field.  I've also increased 
the factors trying to get them be more granular so instead of boosting 
1,2,3,4,5 I did 100,200,300,400,500... Same result.


Here's and example of the debug output with two documents having 
different field boost values but receiving the same score.


Does anything stick out?  Any other ideas on how to get the results I am 
looking for?



69.694855 = (MATCH) product of: 104.54228 = (MATCH) sum of: 0.08869071 = 
(MATCH) MatchAllDocsQuery, product of: 0.08869071 = queryNorm 104.45359 
= (MATCH) weight(searchfe2684d248eab25404c3668711d4642e_boost:true in 
4016) [DefaultSimilarity], result of: 104.45359 = 
score(doc=4016,freq=1.0 = termFreq=1 ), product of: 0.48125002 = 
queryWeight, product of: 5.4261603 = idf(docFreq=81, maxDocs=6856) 
0.08869071 = queryNorm 217.04642 = fieldWeight in 4016, product of: 1.0 
= tf(freq=1.0), with freq of: 1.0 = termFreq=1 5.4261603 = 
idf(docFreq=81, maxDocs=6856) 40.0 = fieldNorm(doc=4016) 0.667 = 
coord(2/3)




69.694855 = (MATCH) product of: 104.54228 = (MATCH) sum of: 0.08869071 = 
(MATCH) MatchAllDocsQuery, product of: 0.08869071 = queryNorm 104.45359 
= (MATCH) weight(searchfe2684d248eab25404c3668711d4642e_boost:true in 
4106) [DefaultSimilarity], result of: 104.45359 = 
score(doc=4106,freq=1.0 = termFreq=1 ), product of: 0.48125002 = 
queryWeight, product of: 5.4261603 = idf(docFreq=81, maxDocs=6856) 
0.08869071 = queryNorm 217.04642 = fieldWeight in 4106, product of: 1.0 
= tf(freq=1.0), with freq of: 1.0 = termFreq=1 5.4261603 = 
idf(docFreq=81, maxDocs=6856) 40.0 = fieldNorm(doc=4106) 0.667 = 
coord(2/3)



On 1/11/2012 9:46 PM, Erick Erickson wrote:

Boosts are fairly coarse-grained. I suspect your boost factors are just
being rounded into the same buckets. AttachingdebugQuery=on and
looking at how the scores were calculated should help you figure out
if this is the case.

Best
Erick

On Wed, Jan 11, 2012 at 7:57 PM, Brettbr...@chopshop.org  wrote:

I'm implementing a feature where admins have the ability to control the
order of the results by adding a boost to any specific search.

The search is a faceted interface (no text input) and which we take a hash
of the search parameters (to form a unique search id) and then boost that
field for the document.

The field is a wild card field so that it might look like this:

field name=search395eff966b26a91d82935c8e1197330c_boost
boost=90true/field

The problem is that in these search results I am seeing is that my results
are being grouped and the individual boost values are not having the
granular effect I am looking for.

Say on a result set of 75 documents.  I see results with search boosts of
60-70 receiving the same score even though they were indexed with different
boost values.  There are always more than one group.

Does anyone know what might be causing this?  Is there a better way to do
what I am looking for?

Thanks,

Brett


Field Definition:

fieldType name=boost class=solr.TextField sortMissingLast=true
omitNorms=false omitTermFreqAndPositions=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType



SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

2012-01-12 Thread Joey Grimm
Hi,

I am trying to use a dataImportHandler to import data from an oracle DB.  It
works for non-date fields but is throwing an exception once I included the
MODIFIEDDATE field (oracle.timestamp field).  Can anyone see what I'm doing
wrong here?  Thanks.



schema.xml
   field name=catModifiedDate type=date indexed=true stored=true / 

db-data-config.xml

entity name=category datasource=jdbc 
query=SELECT ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE 
FROM CATEGORY


field column=ID name=masterId /
field column=PARENTID name=catParentId /
field column=ICONID name=catIconId /
field column=SORTORDER name=catSortOrder /
field column=MODIFIEDDATE name=catModifiedDate/


WARNING: Error creating document :
SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAMP@1e58565},
masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118},
catIconId=catIconId(1.0)={304856}}]
org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field
'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.solr.common.SolrException: Invalid Date
String:'oracle.sql.TIMESTAMP@1e58565'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at org.apache.solr.schema.TrieField.createField(TrieField.java:421)
at 
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-sql-TIMESTAMP-tp3654419p3654419.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

2012-01-12 Thread Colin Bennett
Hi,

It looks like a date formatting issue, the Solr date field expects something
like 1995-12-31T23:59:59.999Z

See http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html

The data import handler does have a date transformer to convert dates

http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer


Colin.



-Original Message-
From: Joey Grimm [mailto:jgr...@rim.com] 
Sent: Thursday, January 12, 2012 1:05 PM
To: solr-user@lucene.apache.org
Subject: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

Hi,

I am trying to use a dataImportHandler to import data from an oracle DB.  It
works for non-date fields but is throwing an exception once I included the
MODIFIEDDATE field (oracle.timestamp field).  Can anyone see what I'm doing
wrong here?  Thanks.



schema.xml
   field name=catModifiedDate type=date indexed=true stored=true /


db-data-config.xml

entity name=category datasource=jdbc 
query=SELECT
ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE FROM CATEGORY


field column=ID name=masterId /
field column=PARENTID name=catParentId /
field column=ICONID name=catIconId /
field column=SORTORDER name=catSortOrder /
field column=MODIFIEDDATE
name=catModifiedDate/


WARNING: Error creating document :
SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAM
P@1e58565},
masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118},
catIconId=catIconId(1.0)={304856}}]
org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field
'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProc
essorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProc
essorFactory.java:115)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHand
ler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
636)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268
)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427
)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: org.apache.solr.common.SolrException: Invalid Date
String:'oracle.sql.TIMESTAMP@1e58565'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at org.apache.solr.schema.TrieField.createField(TrieField.java:421)
at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281)

--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-
sql-TIMESTAMP-tp3654419p3654419.html
Sent from the Solr - User mailing list archive at Nabble.com.





RE: Determining which shard is failing using partialResults / some other technique?

2012-01-12 Thread Gilles Comeau
Hi all,

 

Is there at least a way to print out which shard is being called in the
logging and maybe logging a failure? 

 

INFO: [master] webapp=/solr path=/select
params={facet=truefacet.mincount=1facet.sort=countq=(content_1500_chars:(
(allied+irish+banks+OR++aib+)+AND+NOT+(bluray+OR+RAR++OR+mega+pack))
+OR+title:((allied+irish+banks+OR++aib+)+AND+NOT+(bluray+OR+RAR++OR+m
ega+pack)))+facet.limit=10facet.shard.limit=300distrib=truefacet.field=
organisationwt=javabinfq=harvest_time_long:[131077440+TO+132641279
]rows=0version=2} status=0 QTime=16192

 

Regards,

Gilles

 

From: Gilles Comeau [mailto:gilles.com...@polecat.co] 
Sent: 12 January 2012 07:02
To: 'solr-user@lucene.apache.org'
Subject: Determining which shard is failing using partialResults / some
other technique?

 

Hi Solr Users,

 

Does anyone happen to know if the keyword partialResults be used in a solr
http request?   (partialResults is turned off at the .xml level)

 

Something like: http://server:8080/solr/master/select?distrib=true
http://server:8080/solr/master/select?distrib=truerows=500fl=*,scorestar
t=0partialResults=trueq=my+and+queryfq=harvest_time_long:%5b132537600
+TO+132537600%5d
rows=500fl=*,scorestart=0partialResults=trueq=my+and+queryfq=harvest_t
ime_long:[132537600+TO+132537600]


We have a Solr instance that is periodically failing on distributed
requests, and I am trying to narrow down which one of the shards is causing
the failure.   If the above doesn't work, can someone point me to a resource
or give advice on how to find out which node might be causing the issue?

 

Regards,

 

Gilles



a way to marshall xml doc into a SolrInputDocument

2012-01-12 Thread jmuguruza
If I have individual files in the expected Solr format (having just ONE doc
per file):

add
  doc
field name=idGB18030TEST/field
field name=nameTest with some GB18030 encoded characters/field
field name=featuresNo accents here/field
field name=featuresÕâÊÇÒ»¸ö¹¦ÄÜ/field
field name=price0/field
  /doc
/add

Is not there a way to easily marshal that file into a SolrInputDocument? Do
I have to do the parsing myself?
 
I need them in java pojo cause I want to modify some fields before indexing.
I would think that is possible with built in methods in Solr but cannot find
a way.

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3654777.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a way to marshall xml doc into a SolrInputDocument

2012-01-12 Thread Tomás Fernández Löbbe
Can those modifications be made on the server side? If so, you could create
an UpdateRequestProcessor. See
http://wiki.apache.org/solr/UpdateRequestProcessor

On Thu, Jan 12, 2012 at 5:19 PM, jmuguruza jmugur...@gmail.com wrote:

 If I have individual files in the expected Solr format (having just ONE doc
 per file):

 add
  doc
field name=idGB18030TEST/field
field name=nameTest with some GB18030 encoded characters/field
field name=featuresNo accents here/field
field name=featuresÕâÊÇÒ»¸ö¹¦ÄÜ/field
field name=price0/field
  /doc
 /add

 Is not there a way to easily marshal that file into a SolrInputDocument? Do
 I have to do the parsing myself?

 I need them in java pojo cause I want to modify some fields before
 indexing.
 I would think that is possible with built in methods in Solr but cannot
 find
 a way.

 thanks

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3654777.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: a way to marshall xml doc into a SolrInputDocument

2012-01-12 Thread jmuguruza
even if they could (not sure if they could be done there, as they involve
properly formatting some fields so dates are in correct format etc, and
maybe the format is checked first) I would prefer to do it in the solrj side
as the code will be much simpler for me.

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/a-way-to-marshall-xml-doc-into-a-SolrInputDocument-tp3654777p3655033.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SpatialSearch, geofilt and documents missing a value in sfield

2012-01-12 Thread Smiley, David W.
Hi Tanguy,

On Jan 11, 2012, at 6:14 AM, Tanguy Moal wrote:

 Dear ML,
 
 I'm performing some developments relying on spatial capabilities of solr.
 
 I'm using Solr 3.5, have been reading 
 http://wiki.apache.org/solr/SpatialSearch#Spatial_Query_Parameters and have 
 the basic behaviours I wanted working.
 I use geofilt on a latlong field, with geodist() in the bf parameter.
 
 When I doq=*:*fq={!geofilt pt=x,y d=r unit=km 
 sfield=coordinates}defType=edismax everything works fine.
 
 But in some cases, documents don't have coordinates.
 For example, some of them refer to a city, so they have coordinates, while 
 others are not so precisely geolocated and simply refer to a broader area, a 
 region or a state, if you will.

You've seen this; right?
http://wiki.apache.org/solr/SpatialSearch#How_to_combine_with_a_sub-query_to_expand_results

 I tried with different queries :
 
 - Include results from a broader area : q=*:*fq=(state:FL OR 
 _query_:{!geofilt ...}) .
 = That works fine (i.e. results showing up), but not as expected : this only 
 returns documents having FL as value in the state field AND some value in the 
 coordinates field *or* documents around my point but not documents without a 
 value in the coordinates field…

Your explanation of what happens is not consistent with with this query does.  
The filter query is OR, not AND.  The xml example docs that come with Solr 
don't all include a value in the store LatLonType field, so if what you claim 
is true, you should be able to prove it with a query against that data set we 
all have.  Please try to do so; I think you are mistaken.

 - Include results from a broader area, feeling lucky : 
 q=*:*fq=((state:FL%20AND%20-coordinates:[*%20TO%20*])%20OR%20_query_:{!geofilt%20pt=x,y%20d=r%20unit=km%20sfield=coordinates})
  
 = which does what is asked to... Return both the results with FL in the 
 state field and no value in the coordinates field *plus* results within a 
 radius around a point, *but* the problem is that in that case, the solr 
 search layer dies unconditionnally with the following stack :
 Problem accessing /solr/geo_xpe/select. Reason:
 
null
 
 java.lang.NullPointerException
at 
 org.apache.lucene.spatial.DistanceUtils.parsePoint(DistanceUtils.java:351)
at org.apache.solr.schema.LatLonType.getRangeQuery(LatLonType.java:95)
at 
 org.apache.solr.search.SolrQueryParser.getRangeQuery(SolrQueryParser.java:165)
...
 Of course, it doesn't make sense to expect the distance computation to work 
 with documents lacking value in the coordinate field!

Arguably this is a bug.  LatLonType doesn't handle open-ended range queries and 
it didn't check for a null argument defensively either.  This will happen 
wether there is indexed data or not.

[* TO *] queries are slow, particularly when there are many values -- like at 
least a thousand.  If you want to perform this type of query, instead index a 
boolean field corresponding to another field that indicates wether that field 
has a value.  This would be a good use of an UpdateRequestProcessor but you can 
just as well do it elsewhere.

 From a user perspective, having the possibility to define a default distance 
 to be returned for document missing a value in the coordinate field could be 
 helpful... If something like sortMissingFirst or sortMissingLast is specified 
 on the field.
 * sortMissingLast=true could be obtained with a +Inf distance returned if 
 no value in the field
 * sortMissingFirst=true could be obtained with a 0 distance returned if no 
 value in the field
 
 I may be misunderstanding concepts, but those sorting attributes seem to only 
 apply for sorting and not to the documents selection process (geofilt)..? I 
 know that since solr3.5, it's possible to define sortMissing(Last|First) on 
 trie-based fields, but I don't know what happens for fields defined that way :
 ...
 types
...
 fieldType name=double class=solr.TrieDoubleField precisionStep=0 
 omitNorms=true positionIncrementGap=0/
 fieldType name=latlong class=solr.LatLonType indexed=true 
 sortMissingLast=true omitNorms=true subFieldType=double /
...
 types
 ...
 fields
...
 field name=coordinates type=latlong indexed=true stored=true 
 mutliValued=false/
...
 /fields
 ...
 
 Help is welcome!

Indeed, sortMissing,etc. are used in sorting, and play no part in wether a 
document matches or not.  And for LatLonType, they won't do anything.  
LatLonType uses the a pair of double fields under the hood, as seen in your 
schema excerpt.  You could put those attributes there but I don't think that 
would work.  I was playing around with blank values yesterday and I found that 
blank values result in a distance away from the query point that is very large… 
I forget what value it was but you can try yourself.

~ David Smiley



replication failure, logs or notice?

2012-01-12 Thread Jonathan Rochkind
I think maybe my Solr 1.4 replications have been failing for quite some 
time, without me realizing it, possibly due to lack of disk space to 
replicate some large segments.


Where would I look to see if a replication failed? Just the standard 
solr log?  What would I look for?


There's no facility to have, like an email sent if replication fails or 
anything, is there?


I realize that Solr/java logging is something that still confuses me, 
I've done whatever was easiest, but I'm vaguely remembering now that by 
picking the right logging framework and configuring it properly, maybe 
you can send different types of events to different logs, like maybe 
replication events to their own log? Is this a thing?


Thanks for any ideas,

Jonathan




can solr automatically search for different punctuation of a word

2012-01-12 Thread alxsss
Hello,

I would like to know if solr has a functionality to automatically search for a 
different punctuation of a word. 
For example if I if a user searches for a word Uber, and stemmer is german 
lang, then solr looks for both Uber and  Über,  like in synonyms.

Is it possible to give a file with a list of possible substitutions of letters 
to solr and have it search for all possible punctuations?


Thanks.
Alex.


Re: SolrException: Invalid Date String:'oracle.sql.TIMESTAMP

2012-01-12 Thread yunfei wu
I guess you probably run into the issue between different date value format
in your oracle db and in solr field. Solr only expects XML date value in
UTC format -
http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html.

You might need to consider DateFormatTransformer -
http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer

Yunfei


On Thu, Jan 12, 2012 at 10:05 AM, Joey Grimm jgr...@rim.com wrote:

 Hi,

 I am trying to use a dataImportHandler to import data from an oracle DB.
  It
 works for non-date fields but is throwing an exception once I included the
 MODIFIEDDATE field (oracle.timestamp field).  Can anyone see what I'm doing
 wrong here?  Thanks.



 schema.xml
   field name=catModifiedDate type=date indexed=true stored=true /

 db-data-config.xml

 entity name=category datasource=jdbc
query=SELECT
 ID,PARENTID,ICONID,SORTORDER,MODIFIEDDATE FROM CATEGORY


field column=ID name=masterId /
field column=PARENTID name=catParentId /
field column=ICONID name=catIconId /
field column=SORTORDER name=catSortOrder /
field column=MODIFIEDDATE
 name=catModifiedDate/


 WARNING: Error creating document :

 SolrInputDocument[{catModifiedDate=catModifiedDate(1.0)={oracle.sql.TIMESTAMP@1e58565
 },
 masterId=masterId(1.0)={124}, catParentId=catParentId(1.0)={118},
 catIconId=catIconId(1.0)={304856}}]
 org.apache.solr.common.SolrException: ERROR: [doc=124] Error adding field
 'catModifiedDate'='oracle.sql.TIMESTAMP@1e58565'
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:324)
at

 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at

 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
 org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at

 org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at

 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at

 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
 Caused by: org.apache.solr.common.SolrException: Invalid Date
 String:'oracle.sql.TIMESTAMP@1e58565'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at org.apache.solr.schema.TrieField.createField(TrieField.java:421)
at
 org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at
 org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at
 org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:281)

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrException-Invalid-Date-String-oracle-sql-TIMESTAMP-tp3654419p3654419.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Relevancy and random sorting

2012-01-12 Thread Chris Hostetter

: We have a listing aggregator that gets product listings from a lot of
: different sites and since they are added in batches, sometimes you see a
: lot of pages from the same source (site). We are working on some changes to
: shift things around and reduce this blocking effect, so we can present
: mixed sources on the result pages.

if the problem you are seeing is strings of docs all i na clump because 
they have the same *score* then just add a secondary sort on your random 
field - in the example you posted, you completley replace the sort by 
score with sort by random...

sort = score desc, random_1 desc

but that will only help differentiate when the scores are identical.

alternatively: you could probably use a random field in your baising 
function, although you should probably use something like the map or 
scale functions to keep it from having too much of a profound impact on 
the final score.

maybe something like...

q={!boost 
b=product(scale(random_1,1,5),recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1))}
  active:true AND featured:false +_val_:haspicture

-Hoss


Re: Question about updating index with custom field types

2012-01-12 Thread 罗赛
Hi Sylvain,

I'm very sorry that I could not help you for I'm also doing pure English
project...


Erick,

Thanks for your approach, I'll try it.

Luo Sai


On Wed, Jan 11, 2012 at 10:08 PM, Erick Erickson erickerick...@gmail.comwrote:

 I'm not sure what custom field types have to do with XML here.
 Somewhere, you have to have defined a *field* in your schema.xml
 that references your custom type, something like:
 field name=the_offer type=offer . /

 then the XML is just like any other field
 doc
   field name=the_offer attr1=val156.75/field
 /doc

 WARNING: I don't quite know how to access the attributes
 down in your special code, I haven't had the occasion
 to actually do that so I don't know whether the attributes
 are carried down through the document parsing

 Best
 Erick

 On Tue, Jan 10, 2012 at 4:20 AM, 罗赛 seraph@gmail.com wrote:
  Hello everyone,
 
  I have a question on how to update index using xml messages when there
 are
  some complex custom field types in my index...like:
  fieldtype name=offer class=com.xxx.OfferField/
  And field offer has some attributes in it...
 
  I've read page, http://wiki.apache.org/solr/UpdateXmlMessages and
 example
  shows that xml should be like:
 
  add
   doc
 field name=employeeId05991/field
 field name=officeBridgewater/field
 field name=skillsPerl/field
 field name=skillsJava/field
   /doc
   [doc ... /doc[doc ... /doc]]
  /add
 
 
  So, could u tell me how to write the XML or is there any other method to
  update index with custom field types?
 
  Thanks,
 
  --
  Best wishes
 
  Sai




-- 
Best wishes

罗赛

Tel 13811219876


Re: Solr 3.3 crashes after ~18 hours?

2012-01-12 Thread cowwoc
I believe this issue is related to this Jetty bug report:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=357318

Gili

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-crashes-after-18-hours-tp3218496p3655937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemming numbers

2012-01-12 Thread Chris Hostetter

: We've had some issues with people searching for a document with the
: search term '200 movies'. The document is actually title 'two hundred
: movies'.
: 
: Do we need to add every number to our  synonyms dictionary to
: accomplish this? Is it best done at index or search time?

if all you care about is english, there's actually an 
English.longToEnglish method in the lucene test-framework that was 
used to generate test corpuses back in the Lucene 1.x days .. i 
don't actaully think it's used in any Lucene tests anymore at all.

could probably whip up a filter using that in about a dozen lines of code 
... but it still wouldn't handle things like dozen (or half dozen or 
gross) but it's there if you want to try.


-Hoss